NUNC COGNOSCO EX PARTE 


ae 


wesw 


OI AS A. EIR Y. 
VRENITOINTV ER Sia 


Universitext 


Editors 


F.W. Gehring 
P.R. Halmos 
C.C. Moore 


A.|. Kostrikin 


Introduction to 
Algebra 


Translated from the Russian by 
Neal Koblitz 


Springer-Verlag 
New York Heidelberg Berlin 


Yeont Univnrsity Leory 


PETETSCROVOR, GOR, 


A.I. Kostrikin Neal Koblitz 


Moscow State University Department of Mathematics 
Department of Mathematics University of Washington 
Moscow, U.S.S.R. 117234 Seattle, Washington 98195, U.S.A. 


Editorial Board 


F.W. Gehring 
University of Michigan 
Department of Mathematics 
Ann Arbor, Michigan 48104 


U.S.A. 

P.R. Halmos C.C. Moore 

Indiana University University of California at Berkeley 
Department of Mathematics Department of Mathematics 
Bloomington, Indiana 47401 Berkeley, California 94720 


U.S.A. U.S.A. 


AMS Classifications: 13-01, 16-01, 20-01 


Library of Congress Cataloging in Publication Data 


Kostrikin, A. I. (Aleksef Ivanovich) 
Introduction to algebra. 


(Universitext) 
Translation of: Vvedenie v algebru. 
Bibliography: p. 
Includes index. 
1. Algebra. I. Title. 
QA154.2.K6713 512 82-5534 
AACR2 


© 1982 by Springer-Verlag New York, Inc. 

All rights reserved. No part of this book may be translated or reproduced in 
any form without written permission from Springer-Verlag, 175 Fifth Avenue, 
New York, New York 10010, U.S.A. 


Printed in the United States of America 


ORS sOnsn4e 32.1 


ISBN 0-387-90711-4 Springer-Verlag New York Heidelberg Berlin 
ISBN 3-540-90711-4 Springer-Verlag Berlin Heidelberg New York 


A Note on the English Edition 


¢ 
Every textbook is written taking into account the traditions in a given 


university or, more generally, in the universities of a piven country. My alpebra 
textbook is no exception. At the same time, the exchange of ideas in the area of 
mathematics teaching in different countries is no less important than the exchange 
of ideas in research. The Soviet Union, in particular, has accumulated a rich 
experience in translating into Russian monographs and textbooks published in other 
countries. 

It will be a pleasure, as’well as an honor, for me to see this textbook 
translated into English, under the auspices of Springer publishing house, which is 
famous throughout the mathematical world. I would like to express my deen thanks 


to Neal Koblitz for his excellent translation. 


Moscow, November 1951 A. Kostrikin 


Digitized by the Internet Archive 
in 2019 with funding from 
Kahle/Austin Foundation 


https://archive.org/details/introductiontoalO00O0kost 


Translator’s Preface 


This textbook, written by a dedicated and successful pedagogue who developed 
the present undergraduate algebra course at Moscow State University, differs in 
several respects from other algebra textbooks available in English. The book 
reflects the Soviet approach to teaching mathematics with its emphasis on applications 
and problem-solving -- note that the mathematics department in Moscow is called the 
"Mechanics—Mathematics" Faculty. In the first place, Kostrikin's textbook motivates 
many of the algebraic concepts by practical examples, for instance, the heated plate 
problem used to introduce linear equations in Chapter 1. In the second place, there 
are a large number of exercises, so that the student can convert a vague passive 
understanding to active mastery of the new ideas. These problems are intended to be 
challenging but doable by the student; the harder ones have hints at the back of the 
book. This feature also makes the book ideally suited for learning algebra on one's 
own outside of the framework of an organized course. In the third place, the author 
treats material which is usually not part of an elementary course but which is 
fundamental in applications. Thus, Part Il includes an introduction to the classical 
groups and to representation theory. With many American colleges now trying to 
bring their undergraduate mathematics curriculum closer to applications, it seems 
worthwhile to translate Soviet textbooks which reflect their greater experience in 
this area of mathematical pedagogy. 

I would like to express my deep appreciation to Robert Cornell for his careful 
reading of Part I and his many suggestions for increasing clarity and readability. 

I would also like to thank Barbara Moody for her meticulous typing of the manuscript. 


Neal Koblitz, Seattle, February 1982 


Contents 


eReEMeIe, soomoga6 3 Bisuscatine Mayor seiner adams ctots Peas eee ney RRO H8 See ees o 
Advice to the reader ...... Se Saas seecedenee Sasoreenaase 
PART I. Foundations of Algebra ....... Pro b.O EaRGsA.H pasGsu- deere scral creas 
PUES TSU oe aoa e aa eae o moons suenanouatenensbedensrcecaiwersics , 
GinaEce Il, Seweces wie Ellie) Cou ogaqusno0o Hn Scere ee 
ile ~Mile@ioiees abo Ie posaaosauoous Seapeneacee aauaaanegecensheieregsonneee 
Oy SOMemmodiell MmirObientSi even ercsus vanes crseer ors seen ersten oer se eneniee 
a oOlvabultty om vequattonsein taditicalls) rcs ene < cst 
2 Pee nemsiteatzeSteo te rauino li eeumle merenscerscy ee seesre Sire celisesicer ar iis 
Si, (Crovalaiinyes “Sisgheroyantaieatoin Gaga connacaaaad oon Sodamaoo dt 
a tie heated pillate  wrobillem g.4.--..34-- parte toucaeyegetere 
§3. Systems of linear equations. The first stens ...... 
Il, WesWN@lCeRe naonancs pooca uo BoAgOo Hund guomUmE a 
Cee uavalencemok sine ares ViSUCMS ir. treicisielatsnetoletsr s+ eter 
je Neducine sco Step Gorm a... So aeeata eat ere Oe 
4. Studving a svstem of linear equations .......... 
5. Some renarks shoal YpezMIVSS gb agnacacsaqucaons aoaeo 
§4. Determinants of small order ...... Sern O tapos teeeuesi aiayeks Ree 
Exercises ...... Sudo Agab a dan oeanenoudeo odes aun OS 
G5), Sine) Sind MEPIS soogdedengoaacngdus AP ORCA PEE aS 
Ibs SGES a2gcuccee acoso poootaos Re eine Aeeerecne CeeRO eee O 
2 “Gmnpslinsis! 4 4000 Aeon ntoo oo ean ao oo glareaeee ce eshaen es fee 
Dace) Gyecaogoo owl oS Du shaneuanstahere ter sweets ebenonsnetere tener 
§6. Equivalence relations. Quotient mans ...........+-- 
ily iWalnvebste jetallenecatemist oo on noo OME eo aero gy SO-GOn ane 
2. Fouivalence relations ..... Saw So Sontabo oun enw 4 
S\ CmGEMemiIe MEME nooooosasooadadcaqou ds B4.6.00 6 baa OU 
i Oem SCE nocaeeasnococe Peon eae essere 
DPAICUHSS ooagndo0 Sota Ae ato 6 Oat OOO AO OO TLS 
§7. The principle of mathematical ab otahc(eakenn pooeas ener 
§8. Integer arithmetic ........s esse eee eeeeee sBee deus scucner 
1. The fundamental theorem of arithmetic .......... 
2. g.c.d. and l.c.m. in Z.......4-- pongo go doau ad as 
3. The division algorithm in Z ......-eeeeee SHG naos 
Ee TRC dese Sieniienseusreione nicusa-ued-t-ttehonene boot raonoo Od Cua S 
Chapter 2. Vector spaces. Matrices ..... Goonmoudo oo Cou onto 
Sle Vector SPACES ssa oe ONE eRe ee et ers 
1) Motivation 2.2... 4... Ay a AOR OO Sedo osR ee oo eons OAS 
2. Basic definitions ..... err eerterntre Re ED oh 2ao buna ow 
3. Linear combinations. Linear Span .-+eeeeeees Sgo6 
4. Linear dependence .......-- RHOO Sono oH Reet ears 
5. Rases. Dimension .....- AALS LO Ord See SO et OO RE 
LBRaisiohlives! goa a0 ona eo0cdudoo sea cad coos an be aon Ones 
§2. The rank of a Matrix ..seeseeeeeeeeee SEER ORO EEE Cn ee 
1. Pack to eauations ....-..-- none so Pantie n god Gono nes 
9. The rank of a MAtrix ...-- ee eeeeeees 46 oteateerae se 
3. Solvability criterion ..... Sooo Hoo eMoO Cao anaes 


IbRNeCSSG) go ngoun ood ooo Sou ene osumouc i saehe 6on.o0 


66 


55 


84, 


Chapter 


Bho 


82. 


§3. 


Chapter 


Bale 


§2. 


§3. 


84, 


Ipiniehe jays, WEES CRETE Sa aoonoonccgdnonan 78 
il, Wieleieilees eigrel WOE caococconnnnodnconcoanaggqoo one 78 
2, Weick puller yltCeretoml gosoodacouccddoucgoodnoas 81 
3). IS QuUaGerMALC CES. aeveverari aver creveiecey el cuctoeesemerenel ered ce enete ere 84 
EX GT GTSES. crecin's, autverseve casi rerevurerscensucretenel ene eneiey tuencmene ern ere ML 
Wis BGS CHE SOIMEUCMIE coooooonanoDuoD Doo ooGSO SOD 23) 
1. Solving a homogeneous linear system ........... oS) 
2. Linear manifolds. Solving a non-homogeneous 
EVEIWW Soon nocndaoum none Logcunqd ooh dcoeeoonn 64 Si 
Si, Uo ieginlic OME Bl HEC GIF TWAIENCES 5ogucananacone Bis) 
4. Equivalence classes of matrices ..........+s00- 99 
MAIO! KoooduadcisoUnKonoDounOOnnuOnDoOMDanooO nO o 104 
5 WEESIbiEINtS oop adoobeatodsnbontannoondopn Hoan 106 
Determinants: construction and basic pronerties .. 197 
I, COMmeticn@iilom by tiomlicetGm oaccconsogc0ncaogooEd 107 
2. Basic properties of determinants .............. 110 
EXEL CLUSES: te laiesscare ecuevalenscata te roueectoasbencmemebencnsperener sccmeyey® seas ts Alas: 
Further properties of determinants ............... 118 
1. Expanding the determinant along an arbitrary 
COMMU prea oOo nO MeO a OOOO odo Un uE a ame 118 
2. The properties of determinants relating to 
COMMMSY cs ehs cane tone bo cveeee etev ene Ol ero rene erro ne ere rene meen heer chee enees ales 
3.) Thes transpose de ce xrmimamitemereners crcpeteteretsistsherc tel ctetcte 120 
4.) Determinants) Of Specials matrices tctere aise. 124 
5. Building up a theory of determinants .......... 128 
BMEY GASES” chile wee-aislereusuccouen se manter pam astensneeeacdemetonemsacacaenerenewary Le 
Applicat tons eo terdetermitmantssmmemmatere ns termn cnet sam enore ts 130 
1. Criterion for a matrix to be non-singular ..... 130 
oq Wrouyie stats Tvs Gasnele CHE El WAIBS sacooaancsonounn 134 
id (sialenuslslc enna nre erie Sis oke oti HOON 640 Hoo 1h COC U AGA ATION 136 
4. Algebraic structures (groups, rings, fields) ... 138 


Sets with algebraic operations ........: Sue ccteteel see ve 138 
le Bibpieney CSIC oo cosnoccoconeasaneouoeneaEas 138 
A, Seumpicenios siolel Meola ssoncoceucanuensucandede 39) 
3. Generalized associativitv; powers ...........6. 141 
i lnvier table: ielementsy racic cra ene tereener rier artes: 144 
BK Ors CHEGIOTG| ce coils, cvercne saeneneechaneee tere iceste foneveut eeemcheneren nade Rerarerer stars 144 
GOW Sapere aye crac orerecsnctaucte tens eeceecucgt conven Menseonine sewer vector iemetemer are 145 
1. Definition and examples ..... CEG CREP REN CLGEA ty ORES 145 
Do SVSEGMS OF PEMEIPIEOIPS ooonsonnccounienscnoggensn 148 
Si Gli: icine ment oom oou does sono aU Bobo Re AO ae 150 
4. The symmetric group and the alternating group . 153 
EXEL Ci Se suey Noon Gee ee en eee G2: 
MORE DIULEMS CHE CIROMNS connoeoosnenoounbeubonboaodoos 164 
ile USsMORPOIEOS coscanonpoocdne boro onoDo UD OnCOMOeD 164 
15 TOMOUOI NEMS ansscconosccnoos ans coovsadonuanoue Ge) 
oy (Glemeeiay, eagles GococécosuduucedoundH dnd g on 170 
hs, (CORSE GH El Etbyadlihio sonnaecdcsosoonoouncansage 78 
De Ws ihenimoaniincm  — SENG) soneassecassogonde all 
HMMS Koo adodco das AO nfo mst ve) ee a Re Areal 
Roteie gil iwilGllals cooscanonegugsnoanusou0dnoGgGOunn 182 
1. The definition and general properties of rings 182 
2. Congruences. The ring of residue classes ...... 187 
3. Ring homomorphisms and ideals ................. 189 
4, The concept of quotient group and quotient ring 191 
Oy HSNO Ge Tete PLEIGS oonagoecaunannosnoucauce 125 


Chapter 


§1. 


525 


§3. 


§4. 


Chapter 


Sil, 


§2. 


§3. 


§4. 


x) 


Oo Wine Chwiciereeisice @le a TSIGl -sonaoucgoooncoar 199 
Us i ieBuevele Chal ibleehe EYES ayonanssooosancoouar 202 
BeeROISES pocacovedsondadundbonenhnaoehonosoon0beaD 205 
Dp complexs@numbers andi polynomials ja. 4. ese seeees 207 
Hivewisicldorcompilescenumbensmennias ciemicetere crete 207 
ils Ne) Ziibacitahey oimscaneealiom cocqunnenoounuseoene6 208 
Zo Wie Compiles, ple sogonssencucapasouponoac6sude 209 
3. Geometrical interpretation of operations with 
COMPILES RUNES cos ooamsnonnpaneoene noodus GOun0 210 
4. Raising to powers and extracting roots ........ 214 
So WigstqWemiSse: Eisen ooanacondsanconancouDeduonoo ZAG 
EMSC cooooodon oon aOnG bon eanh Hone neonUsedo oon 220 
ules: Che qo@ikymemmtallss feaauodgescuosandoosuanegngoe 222 
is Holknaepmlees Shay Choe. Wweheatelellis sounooodouoodn anton 223 
2. Polynomials in several variables .............. 228 
Sin Hare clikysleiton Culetoyesticlin sHeanodooanonoobboGHoouoD 232 
ECE Te Cullis S amratrerecte femee sed Wertctetonitepe see. waneesectcn cucmtie rete tenes cour cnch ete 239 
Wecteonentias Sg) jayiqewal=enl Gees Soeoosoonopoodg S500 23 
1. Elementary divisibility properties ............ 237 
De oer Bil ISA. Shel Tiles cooopomeooudcoouOKGd 241 
3. Unique factorization in Euclidean rings ....... 243 
&, Mieidseenliys wellkieemsleilis soasacnudnonnococn odo 247 
BPO oabous does es soe ouUcouUonUEOdnemeOomn Ano Qaul 
Ges ses) Ge scideveiedomi sonagaccoodndacduccopooD Hu Pia, 
1. Construction of the field of fractions of an 
mimieseiecul wlojehin GoppoaoonuacoouDe oO uC oO DOLuUdbOS ye 
2. The field of rational functions ...............- 2D) 
3. Primary rational functions ........e ee eeeeeoecs Za); 
ERACISOS. soos custecopesoeocaoquoonnD nd S0O nue GO de 261 
6. Roots of polynomials 2... cece cece eee e cere eeeene 263 
General properties of TOOTS 1... eee ee eee cece eeeeee 263 
1. Roots and linear factors 1... cece ees ec eee eeece 263 
2. Polynomial functions 1... eee eceec eee eeeeee eres 266 
3. Differentiation in polynomial rings ........-.. 269 
4, Multiple fFactOrs ..eees cece cere eee ee eee eee eeecs Pal 
BeVille PalSuerormu leas) ees ayers seciaiecere eur ctere (sts) ele) yells) “lerejiesell 274 
SCeTUCUS Se erste cian eton ete e tece ste eceteh alae lalla) s\leis eis eierelieueisie/isiiersiie/e.s 277 
Symmetric polynomials 1... see eecee eee rere ceeeans ZS) 
1. The ring of symmetric polynomials ...-.+.++s-- Bye) 
2. The fundamental theorem on symmetric polynomials 281 
3. The method of undetermined coefficients ....... 284 
4. The discriminant of a polynomial .........-+--e- 288 
5. The resultant 2... eer ce cece cece reser ses sccnes 291. 
EXerciSe€S wsscccccece ence cscs etsrccsecersesscceres els) 
Q@ is alpebraically closed ...-sseeseeeeececeeecee 296 
1. Statement of the fundamental theorem ...-.+..-- 296 
2. The splitting field of a polynomial ......+.+-. 299 
3. Proof of the Fundamental Theorem ....seeereeees 303 
Polynomials with real coefficientS co. cere eevee 307 
1. Factorization in R[X] ...es cesses eee ee eens 307 
2. The problem of isolating the roots of a 
polsnGbiEl 5onccocnsegooucKo WD Coon o oor RAO OR Od 399 
3. Stable polynomials ......eeecee reece sees eee eees oe 


ExerciSeS seccecccccces cess cece sett res eceeesceeees 


xii 


NUE 1, Crecwyas, Mites. Mocltles coocoogoco0on0nugDda0oKUDDON 320 
Mineelese TAN sognhoonon Gnas onaooonbonooaonAnoeagaueD 321 
(Glyyieere 5 (seOUS oobnndocsooguovos HOOD DDD ODO HOOD OUD OODNOHO B22 
Si Glassiealmorcupsmanmlowedeime ns Ons menrtn tnt tern BD) 
i, @@nyeall alewitesticitoms oeacaossoseneesodaosonusone 322 

D, Wetec oe SUC) sinycl SOC) saocosccas 324 

Sl Wine enlimorpastcm SWC) =H) sougonnscagacan 326 
Geonetrilcalmeharactertz a tlonmOtmmeEs OCS) meron etree. 329 
ISWCOLRSS. poonnn ono oooH DOC Ee OOH OONSO dono On oo cc0 ana B30) 

12, Geom CEOS Gm BEES cococcoscconocouccouuconOONN Sis 
i Bomonorphasmsi G—— Siu) ment reneteicnencenenente arlene: Soil 

2, The orbit and stationary subgroup of a point .. 332 

3. Examples of group actions om Sets ......-..-0.5 85 

G, GMFCS SOBESS: ooadonosnnanvangoesoocodound 340 
IPMECCUGES suns cisocood ooo goauaboduenguodade@oduaeoD 341 

WO, BOMeS (eo) iSCAELO COMISEMNOEUOMS ooocoancouguccs 342 
1. General theorems on group homomorphisms ....... 343 

2h I SOLVADMEM SCOUP Sy iene cnencterehere tern stat erento a cteketeterercnes cree eis 348 

Oo Sehuols Wrens faccooeodnoconsabocucéneoouneenne Boal 

di, WrefsVohWOes) Gie RISUINS goncansanncodgoonscnoecsu0ds 354 

5. Generators and defining relations ............. B57, 

EXe TCLS OS eee levecwais ener rai Sc eee creme 364 

SA. LherSvilowst heonemsmr-memu-iteege reece rrr reieie eae sree 368 
IMBICLSSE: Ga oooob cts dogo oO UKO GND nono OOOH Gee OOmEEE B75 

NB Tigulice Elyse: EROS oonccnnacdusacodgcengcuanuur 376 
js Piaiieey AbGiIsie il GROWS osccocsoooanonsonsondouN 376 

2. The structure theorem for finite abelian groups 381 

EX GI CdIS eC Samereneuewsucie chetenencmeyereis eaen ceed scbercis oecremerencrsiensn creer crore 384 
Chapter 8. Elements of representation theory ..........+..-:. 386 
81, Definitions and examples of linear representations 390 
IG EEO Comes ooonous GaoGdonunood A hduonnnenonan 390 

2. Examples of linear representations ............ 396 

PIR OHS CHES O'Si ncihonchestreteeemsnescnetaneacestcn gente teem emenememe tegen Wsnerencnoke 403 

$2) Unitary and) reducibile) representationsi emer ose. 404 
Io WihliEcbey SENOS MEREIOIN oo6ocacaoundsoonunaneden 404 

Bo (Geimpilaws sesahieiletIsiey anconancnounooooooonnoDue 408 
ERGOT SESS ea ncrecorcuoneneueteve ces ierere remem secueratsrcterer ce ereneee teres ch seere 412 

SS RuiMieS TOEBEIGM BECMS ossccacascegocsocacuesoadns 412 
i thevorderssotysinttemsubproupsmoi es O\(S) ease. 413 

2. Symmetry groups for regular polyhedra ......... 416 
EXON CUSE'S 7 Sas tans susie eeneun-sacuerehemeneetuelectntuckccenetereneeiemenerenenere 420 

§4. Characters of linear representations ............: 421 
io Seiime S cima enyal Goaoliley oooccsaucossnosoanse 421 

2. Characters of representations etre ie sees 424 

boy a aU ear DO T.O ne GOO Oss Ubon ky 6 Apo eE Ge roo 432 

§5. Irreducible representations of finite groups ..... 433 
1. The number of irreducible representations ..... 433 

2. The degrees of the irreducible representations 435 

3. Representations of abelliiansenoupsmacva. ssn: 438 

4. Representations of certain special groups ..... 441 

EXE CLSCS Zine crs eune eater were eM eeteme etre hererne ek 445 

86. Representations) of | SUC) sandsmSO (Cs) meee eee 448 
EXEnCTSES sacic nie net a eee eR eRe 453 


Chapt 


xtti 


87. Tensor products of representations 
ilo Was Glngul seSpresTxeieaienenl noooncopson spoon wonno 6 
2. Tensor products of representations , 
So Wins tales Gir CMebeRlerES ce oooronoadomo nd oon Dons 
4. Invariants of linear groups 
Exercises 


er 9. Toward a theory of fields, rings and modules ... 


Mile IiabiEe: I@ilel See Seoaoen oo aumoononoeaooeD a6 
1. Primitive elements and the degree of an 
EMS MOM) pcacenacaaonon ones ovo ooEEOOoOGuN GOED ODS 
2. Isomorphism of splitting fields 
So JSUMBLIES AILGUIIS See eomenonnoncoabonosokodoaong oO 
4. The Mobius inversion formula and its 
applications 
Exercises 


eee ese rere are errer reer eeeree seo error eneeoeos 


So, Weber TSewilios Gloria sce! Gonah eunoooodancodoune 
1. More examples of unique factorization domains . 
2. Ring theoretic constructions ........eeeeeeeees 
3. Number theoretic applications .........-+.eeeene 
IDQUICGHOOS soacesesacoconsponn edn OOnHdsoOOn Gp D pmo DyD 


Ra. MedWwlas ocoveconagatuaheroonoocogu bus Onoonnetoo us 
1. Basic facts about moduleS .....sewseereewcccencs 
i, Whee. mois: guoonmenop ooo porn emu EE COUN G GO Om 
3. Integral elements of a Lin® «+. seers eeeeeces 
4. Unimodular sequences of polynomials ..........- 


§4. Algebras over a field «1... eee e eee reer creer eceee 
1. Definitions and examples of algebras .........- 
2. Division rings (skew fields) .....-..eeeeeecees 
3. Group algebras and modules over them .......++- 
4, Non-associative algehraS 2... eee cere reer eeeens 
UQOVHOISISE? BA ance anos osseouomaOo sO UCU OnE OD oUD OO OmtD 


Appendix. The Jordan normal form GbE gi wkliab< BooucnunsonoUe 


Hints 


‘py UNS Eels) ao agooocuoocn Onc GuboUo DOC OOO MUNDO rD 


isle eo eutnels) y jeterisve eee ee ise ceyereie (6.19708, 
REGU seas Teer eielisush srelienorelereieecen sre nene.s ote tnts 2. © 


Foreword 


+ 


This book was written to give a systematic exposition of the course in algebra for 
students of the Mechanics-Mathematics Faculty of Moscow University that has developed in 
recent years. ‘The natural evolution of the standard syllabus necessitated at least a 
partial re-working and modernization of the textual material in algebra. 

Formally, the book is divided into two parts, in rough correspondence to the algebra 
courses taught in the first and third semesters at Moscow University. In Part I] we assume 
that the reader is well grounded in the theory of abstract vector spaces and linear operators 
-- material which is studied in the second semester course in linear algebra and geometry. 
However, real vector spaces are presented in Chapter 2 of Part I, several concepts of 
linear algebra are developed along the way in the text, and a small appendix contains the 
geometric theory of matrix reduction to Jordan normal form. ‘The book can therefore be 
studied independently of any other sources, 

A significant role is played by the problems at the end of most of the sections. 
Because of the availability of excellent exercise books in algebra, it seemed pointless here 
to emphasize numerica) calculations; therefore, the problems have a more substantive 


character, and help to develop the basic ideas, In several cases they are referred to in the 


body of the text, but all such exercises are supplied with detailed hints so that there will be 
no difficulty in solving them. We recommend that the reader look at these hints as little as 
possible, only after persistent attempts on his own to solve the problem. 

It is probably unrealistic to expect that a small number of lecture hours will be 
sufficient to cover the contents of the entire book. This is especially true of Part II, the 
material of which is not completely traditional. This material includes a fair amount of 
intuitive motivation, but certain "delicacies" (such as the Sylow theorems, invariants of 
linear groups, representations of rotation groups, and non-associative algebras) are more 
advanced, and are consciously directed toward enthusiasts who may be stimulated to further 
study. 

After studying the fairly difficult 7-th chapter, one should decide whether to concen- 
trate on elements of representation theory (Chapter 8) or on the general theory of rings, 
modules and fields, which is touched upon in Chapter 9 (where it was not possible, however, 
to go deeply into structural questions). The first choice seems preferable not only because 
of its connection with geometry and the material of the second semester course in linear 
algebra, but also because knowledge of the basic facts about group representations is very 
useful to mathematicians who specialize in fields other than algebra. It is extremely 
desirable to solidify one's understanding of group representations, which are illustrated in 
the book using only a few basic examples, by studying further applications. Examples of 
themes for further study are Galois theory, groups generated by reflections (including 
crystallographic groups), representations of compact groups, and soon. On the other hand, 
Chapter 9, with its number theoretic slant, more closely corresponds to the usual syllabus 
in algebra. In any case, either choice of emphasis provides a foundation for further work in 
algebra. 

The beginning of each part of the book contains a small list of supplementary 
literature, which makes no claim to completeness. 

One point should be clearly stated, since it may not be obvious to the beginning 


student. A course in higher algebra, despite its name, can in no way reflect the full gamut 


of modern algebra. It is for this reason that the book is called an “introduction", A further 
purpose of an introduction is to be a sourcebook of concepts and results needed for the study 
of other fields of mathematics. The importance of learning the language of algebra will 
become immediately apparent to anyone who attempts independent study of mathematics with- 
out first acquiring this knowledge. 

Despite its elementary character, the traditional course in algebra presented 
difficulties to the student because of the inherently formal nature of algebraic thinking. The 
author had this constantly in mind, and for this reason attempted to emphasize connections 
between algebra and other areas of the mathematical sciences, It is unfortunate that 
elements of category theory and partially ordered systems did not find a place in the book, 
However, it would have been pointless to overload an introductory course with a conglom- 
erate of abstract notions which would tend to kill interest in those subjects because of the 


inevitable superficiality of their exposition. 


Many different variants of a required algebra course were 


put into practice in the Mechanics~Mathematics Faculty of Moscow 


University over the last ten or fifteen years. It is reasonable to hope that the present 
realization in book form of the recently adopted version of the course will be useful for 
students and instructors at other colleges as well, and also for those who would like to begin 
independent study of algebra. Of course, the order and the degree of completeness with 
which the material in the book is presented in lectures will depend strongly on the concrete 
circumstances and pedagogical traditions in each college. 

The author is very grateful to the experienced teaching staff of the Department of 
Higher Algebra of Moscow University, and wishes to thank those who gave much useful 
advice for presenting the course. All constructive suggestions and remarks concerning 
errors and misprints will be gratefully accepted. 


A. Kostrikin 


Zvenigorod 
July, 1976 


Advice to the Reader 


As explained in the Foreword, the interdependence of the chapters is as follows: 


1 2 3 4 5 6 7 
0 0 00 OO 


\O 0 «- = + D000 


(the broken arrow indicates weak dependence), Of course, an experienced reader (such as 
an instructor or an advanced student) will have no trouble beginning to read from almost any 
place, if he is willing from time to time to turn back to the definitions in the earlier sections 
and chapters. Not all new concepts are introduced in paragraphs beginning with the word 
"Definition". The detailed Table of Contents and the Index can be used to find the needed 
place in the book, 

Each chapter is divided into several sections, and each section is divided into 
several subsections with appropriate headings. The theorems, propositions, lemmas, and 
corollaries within each section have their own numbering: Theorem 1, Theorem Pitre 
Lemma 1, Lemma 2,... . With this primitive but simple system of numbering, when 


referring to assertions in another section we must write, for example, Theorem i §j , or 


even Theorem i §j Ch. k; but this will not cause any difficulties. 

The end (or absence) of a proof is indicated by the sign © . 

For brevity, we use the simplest logical symbols. The implication sign ==> in 
A => B has the simple meaning "A implies B" or "B followsfrom A", while 
"A <==> B" means that assertions A and B are equivalent (A ifandonly if B). 
The general quantifier Y replaces the expression "for all". The other notation will be 
clear from the context. 

Below we give the full Greek alphabet, indicating the pronounciation of the letters. 
Any confusion here can be annoying, since the letters of the Greek alphabet are very widely 


used in mathematics. 


GREEK ALPHABET 


Aa@ BB ay, A6 1 G ZC len Oe Te Kx Ad 
alpha beta gamma delta epsilon zeta eta theta iota kappa lambda 


Mu Nv HS Oo Tl, Pp 26 AD 4p You co) XX 
mu(myu) nu(nyu) xi(ksee) omicron pi rho sigma tau  upsilon phi chi(ki) 


Vo Qe 


psi omega 


"L'algébre est généreuse, elle donne souvent 
plus qu'on liu demande”. 


-- d'Alembert 


Part One 
Foundations of Algebra 


This part can be considered "algebra in miniature". The fundamental concepts of 
groups, rings and fields, which are unfamiliar to the beginning student, are introduced 
informally and in small doses, although the total number of interrelated ideas presented to 
the reader turns out to be quite large. The definitions and theorems should not be 
memorized: they will become familiar after working independently on the problems and 
exercises, It is helpful to concentrate on a few of the most widely used algebraic systems 
(the groups (Z,+), s ‘ a , GL(n) , SL(n) ; polynomial rings; the fields 5 IR 5 Ww. 
and 7 which serve to illustrate the language of algebra. In accordance with tradition 
and considerations of compatibility between high school and college, we first present 
matrices and determinants, which are used to find and study the solutions of systems of 


linear equations. Along the way, basic algebraic structures arise in a natural way. 


Further Reading 
Z. 1, Borevich and]. R. Shafarevich, Number Theory, Academic Press (New York), 
1966. 
H. Davenport, The Higher Arithmetic, Hutchinson's Univ. Library (London), 1952. 


D. K. Faddeev and I. S. Sominskii, Problems in Higher Algebra, W. H. Freeman (San 
Francisco), 1965. 


I. Herstein, Topics in Algebra, Xerox College Publishing (Lexington, Mass. ), 1975. 
K. Ireland and M, Rosen, Elements of Number Theory, Bogden and Quigley, 1972. 
S. Lang, Algebra, Addison-Wesley, 1971. 

B, L. van der Waerden, Algebra, Frederick Ungar (New York), 1970. 


Chapter 1. Foundations of Algebra 


What does algebra start from? To a certain extent, one can say that the sources of 
algebra are implicit in the art of adding and multiplying integers and raising them to powers. 
If one formally replaces integers with letters -- a step which is far from obvious and can 
be carried out in many ways -- one can then proceed according to similar rules in much 
more general algebraic systems. In fact, an attempt to give an exhaustive answer to our 
question would take us not only far back through the ages, but also into the mysteries of the 
emergence of mathematical thought. A difficult part of the answer to this question would 
consist of describing the basic structures of the algebra of our day: groups, rings, fields, 
modules, and so on. But the entire book is devoted precisely to this, so that the goal of 
Chapter I seems at this point to be out of reach. 

Fortunately, under the abstract shell of most axiomatic theories of algebra one 
finds very concrete problems of a theoretical or practical nature, whose solution once 
served as a fortuitous and sometimes indispensable stimulus to far-reaching generalizations. 
The development of a general theory, in turn, gave an impulse and a technique for the 
solution of new problems. The complicated interaction between the theoretical and practical 


aspects, which is inherent in all mathematics, takes an especially pronounced form in 


algebra and to some extent provides a justification for the concentric style of presentation 
adopted in this book. 

After some brief general remarks on the history of the subject, we shall formulate 
several problems which motivate the material in the chapters which follow. One of these 
problems is the point of departure for our study of systems of linear equations and the theory 
of matrices and determinants. We shall give Gauss’s method and thus obtain our first facts 
about the solutions of linear systems. 

At this stage it will already be useful to introduce some standard notation and ter - 
minology, which will be done in a brief survey of set theory. We shall introduce the 
important concepts of an equivalence relation and a quotient map. Further, in order to 
explain the principle of mathematical induction, we establish some elementary combinatoric 
relations. Finally, the simple arithmetic properties of the integers which are given in the 
last section are not only used throughout the subsequent chapters, but are also the prototype 
for constructing similar rules of arithmetic in more complicated algebraic systems. 

The material in this chapter does not go far beyond high school mathematics. The 
reader is only required to adopt a more general point of view. 


The student may begin reading in §3. 
$81. Algebra in brief 


For good reason one often hears these days about the “algebraization" of mathematics, 
i.e. , the penetration of algebraic ideas and methods into both theoretical and practical 
fields of mathematics. ‘This state of affairs, which became completely apparent in the 
middle of the twentieth century, has by no means always been with us, As in every area of 
human endeavor, mathematics is subject to the influence of fashion. The fashion for 
algebraic methods had substantive causes, but sometimes enthusiasm for these methods 
exceeds reasonable boundaries. And since an algebraic shell which obscures the content is 
no less of a disaster than basic ignorance of algebra, it has become customary (justifiably 


so) for books to be praised whenever the author manages to avoid an overloading of algebraic 


10 


formalism. 


While avoiding extremes, one should realize that algebra has from time immemorial 
made up an essential part of mathematics. The same could just as well be said about 
geometry, but here we should cite the opinion of Sophie Germain (nineteenth century): 
"Algebra is nothing but geometry in symbols, and geometry is nothing but algebra in 
pictures". The situation has changed somewhat since then, but it still seems that "the ‘true 
nature’ of mathematical objects is really of secondary importance, and it does not much 
matter, for example, whether we describe a result as a theorem in ‘pure’ geometry or, 
using analytic geometry, as an algebraic theorem" (N. Bourbaki). 

According to the principle that “it is not the mathematical objects which are 
important, but the relationship between them", algebra can be defined (in a way that is some- 
what tautological and is completely incomprehensible to the uninitiated) as the science of 
algebraic operations performed on elements of various sets. The algebraic operations 
themselves grew out of elementary arithmetic. Algebraic ideas, in turn, give the most 
natural proofs of many facts of "higher arithmetic", number theory. 

But the significance of algebraic structures -- sets with algebraic operations -- 
goes far beyond number-theoretic applications. Many mathematical objects (topological 
spaces, differential equations, functions of several complex variables, etc. ) are studied by 
first constructing suitable algebraic structures; even if these structures do not tell the whole 
story about the objects under consideration, they often reflect their most important properties, 
The same can be said about applications of algebra to the real world. 

The definitive opinion on this subject was given more than 45 years ago by P. Dirac, 
who was one of the founders of quantum mechanics: "Modern physics increasingly requires 
abstract mathematics and the development of its foundations, Thus, non-Euclidean 
geometry and non-commutative algebra, which were once considered to be merely the fruit 
of imagination or fascination with logical reasoning, are now recognized to be very necessary 


to describe the general picture of the physical world”. 


11 


Algebraic techniques are very useful in studying elementary particles in quantum 


mechanics, investigating the properties of rigid solids and crystals (here the theory of 


group representations is especially important), analyzing models in economics, 


constructing modern computers, and so on and so forth. 


Algebra, 


other mathematical disciplines. 


topology and algebraic number theory. 


in turn, is nourished by the life-blood of other disciplines, including the 


For example, homological methods in algebra grew out of 


It is not surprising that the appearance of algebra and the way it is viewed change 


over time. Here we cannot give a detailed account of these changes, not only because of 


lack of space, but, even more, because such a historical discussion must be concrete, and 


this is only possible after a basic knowledge of the subject has been acquired. 


We shall only give a schematic list of names and periods. 


The ancient civilizations of Babylonia and 


Egypt. Greek civilization. 


The "arith- 


metic’ of Diophantus (3rd century B.C.). 


Eastern civilizations of the Middle ages, 
The work "ilm al-jabr wa'l muqabalah"”’ 
by Mohammed ibn Musa al-Khowarizmi 


(approx. 825). 


Renaissance 


Fibonacci (Leonardo of Pisa) 
(approx. 1170-1250) 


5. erro 
N. Tartaglia 
G. Cardano 
iL. Gerrari 
F, Vieta 
R. Bombelli 


XVII - XVIII centuries 


R. Descartes 
P. Fermat 

I. Newton 

G. Leibniz 
L. Euler 


(1465-1526) 
(1500-1557) 
(1501-1576) 
(1522-1565) 
(1540-1603) 
(1530-1572) 


(1596-1650) 
(1601-1665) 
(1643-1727) 
(1646-1716) 
(1707-1783) 


Arithmetic operations on integers and positive 
rational numbers. Algebraic formulas in 
geometry and astronomy. Formulation of 
construction problems (doubling the cube and 
trisecting an angle) which occupied algebraic 
minds at a much later time. 


Algebraic equations of degree one and two, 
Introduction of the term "algebra". 


Solution of general algebraic equations of 
degree three and four. 


Creation of modern algebraic symbolism. 


Emergence of analytic geometry -- a solid 
bridge between geometry and algebra. 
Increasing activity in number theory. 
Development of the algebra of polynomials. 
Intensive search for general formulas for 


J. d'Alembert 
J.-L. Lagrange 
G. Cramer 

P. Laplace 

A. Vandermonde 


XIX - early XX centuries 


. F, Gauss 
Dirichlet 
Kummer 
Kronecker 
Dedekind 

I, Zolotarev 
F, Voronoi 


K 

P. 

E 

IL, 

R 

E 

G 

A. A. Markov 
Pp. L. Chebyshev 
Cc. Hermite 
N. I. Lobachevskii 
A. Hurwitz 

P. Ruffini 

N. H. Abel 
C. Jacobi 

E. Galois 

G. Riemann 
A. L. Cauchy 
C. Jordan 

L. Sylow 

H. Grassmann 
J. Sylvester 
A. Cayley 

W. Hamilton 
G. Boole 

S. Lie 

F, Frobenius 
J. Serret 

M. Noether 


D. A. Gravier 
H. Poincaré 
F. Klein 

W. Burnside 
J. Schur 

H. Weyl 

F, Enriques 
J. von Neumann 
Hilbert 
Cartan 
Hensel 
Steinitz 
Noether 
Artin 


ZnO Amo 


(1717-1783) 
(1736-1813) 
(1704-1752) 
(1749-1827) 
(1735-1796) 


(1777-1855) 
(1805-1859) 
(1810-1893) 
(1823-1891) 
(1831-1916) 
(1847-1878) 
(1868-1908) 
(1856-1922) 
(1821-1894) 
(1822-1901) 
(1792-1856) 
(1859-1919) 


(1765-1822) 


(1802-1829) 
(1804 -1851) 
(1811-1832) 
(1826-1866) 
(1789-1857) 
(1838-1922) 
(1832-1918) 
(1809-1877) 
(1814-1897) 
(1821-1895) 
(1805-1865) 
(1815-1864) 
(1842-1899) 
(1849-1918) 
(1819-1885) 
(1844-1922) 
(1863-1939) 
(1854-1912) 
(1849-1925) 
(1852-1927) 
(1885-1941) 
(1885-1955) 
(1871-1946) 
(1903-1957) 
(1862-1943) 
(1869-1951) 
(1861-1941) 
(1871-1928) 
(1882-1935) 
(1898-1962) 


Bourbaki, "Elements of Mathematics" 


12 


solutions of algebraic equations. The first 
approaches to proving the existence of a root 
of an equation with numerical coefficients. 
The beginnings of the theory of determinants. 


Proof of the basic existence theorem for roots 
of an equation with numerical coefficients. 
Intensive development of algebraic number 
theory. 


Search for methods of approximate solution 
of algebraic equations. Conditions on the 
coefficients which ensure a certain location 
for the roots. 


Proof of the unsolvability in radicals of the 
general equation of degree n>5. Devel- 
opment of the theory of algebraic functions. 
Creation of Galois theory. The beginnings 
of the theory of finite groups, mainly based 
on permutation groups. 


Intensive development of methods of linear 
algebra. The emergence, after the discovery 
of quaternions, of the theory of hypercomplex 
systems (such systems are now called 
algebras). In particular, in connection with 
the development of the theory of continuous 
groups (Lie groups), the foundations were laid 
for the theory of Lie algebras. Algebraic 
geometry and the theory of invariants became 
important branches of mathematics. In the 
XIX century, mathematics had not yet become 
highly specialized, and many leading scientists 
worked successfully in several areas. 


The first half of the XX century saw a radical 
reconstruction of the entire edifice of 
mathematics. Algebra gave up the title of 
the science of algebraic equations and tooka 
decisive step along an axiomatic and much 
more abstract path of development. The 
language of rings, modules, categories, 
homology came into wide use. Many diverse 
theories fit into the general scheme of 
universal algebra. The theory of models 
arose in the overlap between algebra and 
mathematical logic. Old theories were 
rejuvenated, broadening their applications, 
Examples are modern algebraic geometry, 
algebraic topology, algebraic K-theory , 


13 
the theory of algebraic groups. The theory 
of finite groups had many bright moments. 
All of algebra is now in a state of dynamic development. Among the Soviet 
mathematicians who have made great contributions to this research are N. G. Chebotarev 
(1894-1947), O. Ju. Shmidt (1891-1956), A. I. Mal'tsev (1909-1967), A. G. Kurosh 


(1908-1971), P. S. Novikov (1901-1975), 
§2. Some model problems 


The four problems below are at different levels of difficulty. The first three, which 
themselves are not all at the same level, are designed exclusively to motivate the study of 
different types of fields, vector spaces, groups, and group representations, i.e. , the 
algebraic theories which will be discussed later in the book. Many specialized monographs 
are devoted to “solving” these probleme! The fourth problem, which motivates the study of 


linear systems, is worthwhile for the reader to try to solve right now, without looking at 


the next section, which contains the necessary steps. 


1, Solvability of equations in radicals. The formula 


eee be tac 
x,,x, = (1) 


2 2a 


; : z ; 
for the solutions x,, Xo of the quadratic equation ax’ +bx+c = 0 is well known from 


elementary algebra. 


3 
A cubic equation e ap ax? +bx+c= 0 takes the form x +px+q=0 after 
performing the substitution x Px - ; a. Let X11 Xoo Xe be the three roots of the 


equation We eee = 0. If we set 


? 


a 3 2 Ee 2 
ho Spee CD) ee ea av 3D (2) 


(where the cube roots must be chosen so that uv = ~3p), then it is possible to show that 


_ Se llsh ay es) 
2 


Oo 
i 
1 
> 
cS 
w 
1 
bho 
~ 
Na) 
} 


14 


nae (eutev) . (3) 


a 
1 3 


alr 


14 
(u+v) , X, = gle uteéev) , X_ = 


Formulas (2) and (3), which are known as Cardano's formulas (1545) and are also 
associated with the names of other Italian Renaissance mathematicians (Ferro, Tartaglia), 
are valid, just like formula (1), for absolutely any values of the letters a, b,c, p,q, 
for example, for any rational values, Similar formulas were found for the roots of a fourth 
degree equation, Then for almost three hundred years mathematicians attempted 
unsuccessfully to "solve in radicals" the general fifth degree equation. lt was only in 1813 
that Ruffini (in rough form) and in 1827 that Abel (independently and completely rigorously) 


proved the theorem that the general equation x ay x" ap 900 =p a. = 0 cannot be 
solved in radicals if n > 4. The fundamental discovery in this field was made by twenty - 
year-old Evariste Galois in 1831 (his work only became known in 1846), when he gave a 
general criterion for any equation (say, with rational coefficients), not just the general 
n-th degree equation, to be solvable in radicals. 

To every polynomial (or equation) of degree no Galois associated a “splitting field” 
and a finite family (of cardinality no greater than n!) of so-called "automorphisms" of 
this field. These automorphisms are now called the "Galois group" of the field (or of the 
original polynomial). Although we shall not dwell on Galois theory in detail, Chapter 7 
contains an intrinsic characterization of the special class of so-called "solvable" groups, 

It turns out that an equation of degree n with rational ceofficients is solvable in radicals 
if and only if the corresponding Galois group is a solvable group. For example, suppose that 
we are given the fifth degree equation x” ~ax-12=0, where a is someinteger. This 
equation corresponds to a Galois group = » which depends in some complicated way on 


a. Gp is the cyclic group of order 4 (and all cyclic groups are solvable, by definition), 


‘ 3 
and the equation x -1=0 is, of course, solvable in radicals. On the other hand, G, 


has the same structure as the symmetric group S< » Which hasorder 120, and, as we 


show in Chapter 7, this group is not solvable. Hence, the equation e axoi= @ iis 


15 


not solvable in radicals. 

In conclusion, we note that the possibility of expressing a root of an algebraic equation 
explicitly in terms of radicals is not very important from a practical standpoint; approxima- 
tion methods are more relevant for computations. But this does not diminish the beauty of 
Galois’ achievement, which had a profound conceptual influence on the subsequent develop- 
ment of mathematics. To begin with, it was Galois theory that set the stage for group theory. 
In the XX century, Galois’ one-to-one correspondence between subfields of the splitting field 
and subgroups of its Galois group has been generalized and enriched with new abstract 
constructions, so that now this correspondence provides an indispensable tool for studying 


mathematical objects. 


2. The states of a molecule. Every molecule can be considered as a system of 
particles, i,e., atomic nuclei (surrougded by electrons). If the system's configuration at 
the initial moment of time is close to an equilibrium configuration, then, under certain 
conditions, the particles in the system will always remain close to equilibrium positions, 
and will not acquire large velocities. Motion of this type is called oscillation relative to the 
equilibrium configuration, and such a system is called stable. It is known that any small 
oscillation of the molecule near a position of stable equilibrium is a superposition of so- 
called "normal" oscillations. In many cases it is possible to determine the potential energy 
of the molecule and its normal frequencies by taking into account the internal symmetries of 
the molecule. The symmetry of the molecular structure is described by the "point group" 
of the molecule. Different realizations of this finite group (its irreducible representations) 
and functions on the group which are associated to these realizations (characters of the 
representations) give parameters of the oscillations of the molecule. 

For example, the water molecule H,0 (Fig. 1) corresponds to the Klein four- 
group (the direct product of two cyclic groups of order two); and the phosphorus molecule 
Pp, (Fig. 2), which has the form of a right tetrahedron with phosphorus atoms at the 


4 


vertices, corresponds to the symmetric group 54 , which has order 24, The irreducible 


16 


representations of these groups will be studied in Chapter 8. Nowadays it is hard to 


imagine how the structure the ory of molecules could have developed without the use of group 


theory. 


ale. IL 


Much earlier applications of group theory are found in crystallography. As early as 
1891, the great Russian crystallographer Fedorov, and then the German scientist 
Schoenflies found the 230 crystallographic space-groups which describe all crystal 
symmetries which are found in nature. Ever since then, group theory has been continually 


‘ 


used to study the influence of symmetry on the physical properties of crystals. 


3. Coding information. In constructing automatic communication systems, on 
earth or in the cosmos, one usually takes the basic message to be an ordered sequence, 
which we call a row (or word): a = (a, rangi, a) of length n, where a= Q or 
1. Since the usual operations of addition and multiplication modulo 2 are well suited for 
execution on an electronic machine, and the symbols 0 and 1 are themselves easily 
transmitted as electronic signals (1 and O are distinguished by how successive signals 
are separated, or else one corresponds to a signal and the other to its absence), it is not 
surprising that the field GF(2) (see 84 of Chapter 4) is used constantly by the specialist 
in information theory. It is sometimes convenient to take the a. to be elements of other 
finite fields. 

If one wants to minimize the influence of static (atmospheric and cosmic interference), 
which canturn 0's into 1's and vice-versa, one must take a_ to be sufficiently long 
and use a special coding system, i.e., a choice of a subset S,. of admissible rows from 


0 


among the set S_ ofall possible words. So is called the code, and its elements are 


called code-words, In that way it is possible to reconstruct a froma distorted word a' 


> 


provided that there weren't too many erroreous signals. 


17 


In this way, error-correcting codes arise, Algebraic coding theory has been developing 
rapidly in recent years, and now includes many clever methods of coding, This theory is 
largely concerned with the special linear codes which are obtained when the choice of So is 
connected with the construction of special rectangular matrices and the solution of systems 

of linear equations whose coefficients belong to a given finite field. A simple example of such 


a code will be given in Chapter 5. 


4. The heated plate problem. A flat rectangular plate with three holes (Fig. 3) is 


used as a valve in an imaginary set-up for obtaining low temperatures. It is covered with 
100 


a square net (grid). The vertices of the grid which lie on the four contours are called 
boundary vertices, and all the other ones are called interior vertices. Experiments show 
that, during any heating or cooling, the temperature at any interior vertex is the arithmetic 
mean of the temperatures at the four nearest vertices (interior or boundary). We would like 
the temperature at the vertices along the contours to have the values indicated in Fig. 3. Is 
this possible, and, if it is possible, is the distribution of temperatures at the interior 


vertices uniquely determined? 


83. Systems of linear equations. The first steps 


Linear equations ax = b and systems of the type 


ax + by 


I 
fa} 


(1) 


U 
Fh 


cx + dy 


with real coefficients a, b, c, d, e, f are "solved" in high school. Our purpose is to 


learn how to work with a system of linear algebraic equations (or briefly: a linear system) 


18 


of the most general type: 


ary %y + aypyXy tere +a) % = by , 
rite Me ee ye 
(2) 
a ity + Aamo%o T° te a = ae , 


Here m and n arearbitrary positive integers. m isthe number of equations and na 
ig the number of unknowns. The simple step of letting m and n_ be greater than two in 
passing from (1) to (2) is of major importance. Systems oftype (2) occur literally in 
every branch of mathematics, and so-called "linear methods", whose end products are 

often the solutions of linear systems, constitute the most developed parts of mathematics, 

For example, at the end of the XIX century the theory of systems of type (2) served as a 
prototype for the creation of a theory of integral equations which plays a vital role in 
mechanics and physics. A large number of practical problems which are handled by computer 


also reduce to systems of type (2). 


1, Terminology. Note the following efficient and convenient notation for the 
coefficients of (2): the ay coefficient (read "a-i-j", sothat, for example, ayy is 


a one-two, never a-twelve) isthe coefficient of the j-th unknown = inpthve ss sisth 
equation. The number b; is called the free term (or constant term) of the i-th equation. 
The system (2) is called homogeneous if b, =O) for a= 192) tt Given any 


system (2), the linear system 


1%) + ayo% torr tay x = 0, 

an Xy + An Xo tree te ao x an On F 

oes ecues ; ons 0 
aay + ano %s + + a ann = @ 


is called the homogeneous system associated to (2). 


The coefficients of the unknowns make up a rectangular table 


19 


(3) 
a a - 
janyll Yan?) mn 
which is calledan mxXn matrix (a square matrix if m-=n). Such a matrix is written in 


abbreviated form as (2,,) , or else is denoted simply by the letter A. It is natural to 


call (a.y 5 aio) cee, a.) the i-th row ofthe matrix (3), and to call 


the j-th column. To economize on space, we shall denote a column by writing a row in 


brackets: [ For a square matrix we speak of its main diagonal 


Sipe yn ae ' 


which consists of the elements 


4412 4999°°° > Fan? A matrix (a,)) all of whose elements 


not on the main diagonal are zero is sometimes denoted diag(a,, y Annet 5 a) and is 


called a diagonal matrix. A diagonal matrix with avi toa ao oe is denoted 
diag (a) and is called a scalar matrix. The matrix diag (1) , Whichhas 1's onthe 
main diagonal and zeros elsewhere, is called the identity matrix and is usually denoted 
EB, or simply EE, when the dimension of the matrix is fixed in the discussion. 

Besides the matrix (3) , we also consider the extended matrix (ai, Ib.) of the 


system (2), which is obtained from (3) by adding on the column of constant terms 


[Lies 4 8) bi ; for clarity, this column is separated from the other columns by a 


loo ooae 
vertical line. 
If each of the equations in (2) becomes an identity when the unknowns x, are 


replaced by numbers a , then we callthe set of n numbers x,°, yood, be 8 
solution of the system (2), and we call x,° the i-th component of the solution. We 


also say that the n-tuple x", Xo" see y x of numbers satisfies all of the equations in 


20 


(2). A system which does not have any solution is called incompatible. If the system has a 
solution, it is called compatible, and it is called a determined system if it has one and only 
one solution, It is possible for there to be more than one solution, in which case the system 
is called under-determined. The problem of deciding when a given system is compatible, and, 
if it is, then what are all of its solutions, is the first series of questions we must answer. 
Now let us once again look at the fourth problem in 82. Suppose we first number all 

of the interior vertices of the plate from 1 to 416 (the number of such vertices in Fig. 3) 
in an arbitrary way. We then add 204 indices for the boundary vertices, and, following 
the rule for computing the temperature t atthe i-th interior vertex , we write down 
416 equations of the type 

. ae a at Be aa ‘ 
@ 4 


Suppose, for example, that a,b,c < 416, and d> 416. Then this equality 


can be rewritten as the linear equation 


BE nec cm teen cues) n= 
on te ia : 


with right side ty = -273,-100, -50, 0,50, 100, or 300. If, om the other hand, 
a,b,c,d < 416, then we obtain a similar equation with five t's with indices < 416 
on the left and 0 onthe right. All of these equations taken together give a square linear 
system of the form (2) with n= m = 416. All of the coefficients ay are equal either 
to O (most ofthem), -1, or 4. Isthis system compatible and determined? We have 
obtained a new, mathematically precise formulation of a qualitative problem. The question 


of existence and uniqueness (in this case, of a solution to the linear equations) is very typical 


of the questions that arise in many areas of mathematics connected with physical phenomena, 


2, Equivalence of linear systems. Suppose that we are given another linear system 


having “the same size" as (2): 


| 
lo 
poe 


9068 0 bo ¢ 6 5 6D ole peo (2') 


We say that the system (2') is obtained from (2) by an elementary transformation of type 


(I) if all of the equations in (2) except forthe i-th and k-th remain the same, while 
the i-th and k-th equations interchange places. On the other hand, if all of the 
equations in (2') except the i-th are the same asin (2), while the i-th equation in 
(2') has the form 

ee + ca, x, pieee $ (a, + cay x, = b. + chy 5 () 
where c_ is any number (in other words, ay ene ae (eho be = b. + cb) , then we say 
that the system (2') is obtained from (2) by an elementary transformation of type (11). 

We call two linear systems (2) and (2') equivalent if either both are incompatible, 
or else both are compatible and have the same solutions. Let us denote equivalence of two 
systems (a) and (b) as follows: (a) ~ (b). Note the following properties of equivalence 
of linear systems: (a) ~ (a), (a) ~ (b) implies (b) ~ (a), and (a) ~ (b) and 
(b) ~ (c) together imply (a) ~ (c). The following theorem gives a sufficient condition 


for equivalence. 


THEOREM 1. ‘Two linear systems are equivalent if one is obtained from the other by 
applying a finite sequence of elementary transformations, 

To prove this, it suffices to prove that two systems (2) and (2') are equivalent if 
(2') is obtained from (2) by applying one elementary transformation. Note that in this 
case (2) is also obtained from (2') by applying a single elementary transformation, since 
each elementary transformation has an inverse elementary transformation, 1n other words, 
in the case of type (I), if we again interchange the i-th and k-th equations, we return 
to the original system; and in type (11), ifwe add (-c) times the k-th equation in (2') 


to the i-th equation in (2'), we obtainthe i-th equation of (2). 


22 


We now prove that any solution (x,°, xo rttty a) of the system (2) is also a 
solution of the system (2'). If the elementary transformation used to obtain (2') was of 
type (I), then the equations have not changed at all; only the order in which they are written 
has changed. Hence, the numbers x,° 5 Xo°y 009 » sae which satisfied them before will 
satisfy them after the elementary transformation. Next, if the elementary transformation 
used to obtain (2') was of type (II), then all of the equations except forthe i-th remain 
the same, and so the solution (x,°; iO ooo . x) satisfies these equations. As for the 
i-th equation, in (2') it has the form (*). Since our solution satisfies the i-th and 
k-th equations of (2), we have 


° eee eo. ° eee °=b 
Be eee Sn ite eo eee 


Multiplying both sides of the second equation by c and adding it to the first equation, and 


grouping terms as in (*), we find that (*) holds with x = x° E 


Because, as noted above, the elementary transformations are invertible, it follows 
that the same reasoning shows that any solution of (2') is also a solution of (2). 
It remains to observe that incompatibility of one system implies incompatibility of 


the other. (Use proof by contradiction. ) 


3. Reducing to step form. By successively applying elementary transformations, 
we can change a given system of equations to a system having a simpler form, 


First of all, we may assume that there is at least one non-zero coefficient ay in 
i 


the first column of coefficients; otherwise there would be no point in referring to the 


unknown X° lf a= 0, use a transformation of type (I) to interchange the first 


equation with a j-th equation for which a # 0. Now the coefficient of the first unknown 


in the first equation is non-zero. Let ayy denote this coefficient. Now for each 


»3,***, M, we subtract C, times the first equation from the i-th equation , 


where c is chosen so that, after we subtract, the coefficient of x becomes 0. 


Obviously, the value of c; which will do this is ea ai/ayy - Wethus apply m-l 
i 


23 


elementary transformations of type (II). We now have a system in which x only appears 


1 
in the first equation. 
It can sometimes happen that the second unknown x, also appears only in the first 


equation of our new system. Let x be the unknown with the lowest index which appears 


in some equation other than the first. We obtain the system 


aT ee = bi ; 
Oo ee on eo) 
ee cata tate = bes te = il. ai, # 0 5 


Ignoring the first equation, we now apply the same reasoning as before to the remaining 


equations. After several more elementary transformations, our system takes the form 


00 se. 4s S605 566 eee ud = ilo! 
Me aA ner i 4 

ut Dieksnstonersions pe be 
eye Bon a 2? 
Saye oe econ sn b, ; 

ame “4 Ss + Ain n * _ 


Of course, here aj =a‘. and by = bh , since the first equation was not touched. 
We continue to apply this procedure as long as possible. Clearly, we will have to 


stop when all the coefficients in the remaining equations of all the remaining unknowns up 


through the n-th are zero. We then finally have the system (2) in the form 


24 


Cee Ou cusuee ec a 


ys ie ca ade ae og?) ; 


a] 


toy ye eee x = b, 9 


Sie fal 
ERAN. Hp i. mG (4) 
ave st eo eee ? 
ae ’ 
Ee es 
0=b 
m 


Here iy? Fone Agye'°* sy AL, are allnonzero, I1<k< £%£<-+** < s. It may happen 
that r=m, in which case the system (4) has no equations of the form 0 = b . We say 
that a system of equations in the form (4) has step form. (This is not the only common 


terminology: such a system is sometimes said to be in trapezoidal form or in quasi- 


triangular form. ) 


THEOREM 2, Every system of linear equations is equivalent to a system in step 


form. 


The proof follows immediately from the above procedure. 


It is sometimes useful to think of the elementary transformations as applied not to 
the system but to its extended matrix os Ib.) . Inthe same way as Theorem 2 we can 


prove 


THEOREM 2'. Every matrix can be reduced to step form using elementary 


transformations, 


4, Studying a system of linear equations. By virtue of Theorems 1 and 2 , the 


questions of compatibility and determinacy need only be investigated for systems in the 
step form (4). 
We begin with the question of compatibility. It is obvious that, if the system (4) 


contains an equation of the form O = b with b, # 0, then this system is incompatible, 


25 


since the equation 0 = b cannot be satisfied by any choice of values of the unknowns. We 


now prove that, if there are no such equations in (4), then the system is compatible. 


Thus, suppose b = 0 for t >r. Wecallthe unknowns x xX 


fee pe phe 
with which the first, second,--- , r-th equations begin principal (or pivotal) variables, 
and we call the remaining unknowns, if there are any, free variables. There are r 
principal variables in all. 

We prescribe arbitrary values to the free variables and substitute these values in 
the equations in (4). We then obtain a single equation for XS (the r-th) of the form 
x = b with a= A # 0; such an equation has a unique solution, Substituting this 
value x= a inthe first r-1 equations and continuing in this way from the bottom 


to the top in (4), we see that values for the principal variables are uniquely determined 


once we have chosen an arbitrary set of values for the free variables. We have proved 


THEOREM 3, A system of linear equations is compatible if and only if, after 


reduction to step form, it includes no equations of the form 0 = b, with b, 7 0. If this 
condition holds, then the free variables can be given arbitrary values, and the values of the 


principal variables are uniquely determined by the system once the values of the free 


variables are chosen, 


Assuming now that this compatibility condition holds, we explain when a system is 
determined. If the system (4) has free variables, then the system is automatically 
undetermined: we can give any values at all to the free variables, and then express the 
principal variables in terms of these values, by Theorem 3. But if there are no free 
variables -- i.e., all of the unknowns are principal variables -- then, by Theorem 3, 
the values of the unknowns are uniquely determined by the system; hence, the system is 
determined, Finally, we note that the condition that there be no free variables is 


equivalent to: r=n. Wehave proved the following assertion. 


THEOREM 4. A compatible linear system (2) is determined if and only if r= 


n 


26 


in the system (4) in step form that is obtained from (2). a 


A square linear system, i.e., for which m =n, after being reduced to step form, 


can also be written in the following triangular form: 


44) % + ayo% beet + aye, = by , 
AL Baas = 5 5 
Og tw oe ©) 
a x =b , 
nn it nh 


if we do not insist that ais #0 forall i. Infact, the form (5) merely means that the 
k-th equation inthe system does not contain unknowns x with i< k, and this is 
automatically true for systems in step form. 

A matrix (a,) Boee elements aij are zero whenever i> j is called upper 
triangular. We similarly define a lower triangular matrix. 


Theorems 3 and 4 have some useful corollaries. 


COROLLARY 1. A near system (2) in which m =n is compatible and deter- 


mined if and only if, after reduction to the step form (5) » all of the ay are non-zero. 4 


Notice that the condition in Corollary 1 does not depend on the right side of the 
system of equations. Thus, when m =n, the system (2) is compatible and determined 
if and only if the corresponding homogeneous system (25) is compatible and determined. 
But a homogeneous system is always compatible; for examp le, it always has the zero 
solution x,° = 0, x,° = O,--- »x,° = 

The condition that all of the a are non-zero means that the homogeneous system 


only has the zero solution. We thereby obtain another form of Corollary 1 not involving 


the step form of the system. 


COROLLARY I’. A linear system (2) in which m= n is compatible and deter- 


mined if and only if the associated homogeneous system (25) has only the zero solution. 


27 


Special attention should also be given to the case no > m, 


COROLLARY 2. A compatible system (2) with n> m_ is never determined. In 


particular, a homogeneous system with no > m always has a non-zero solution, 


In fact, we alwayshave r<m, since the system (4) does not have more equations 


than the system (2) from which it was derived. Hence, if n>m, it followsthat n>r, 


and so, by Theorem 4, the system (2) is undetermined, It remains to note that in the case 


of a homogeneous system, it is undetermined if and only if it has a non-zero solution. 
Some of our results are summarized in the following table. 


Type of linear system 


general homogeneous non-homogeneous, n> m 


ies a 


5. Some remarks and examples. The method just given for solving systems of 


Number of 
solutions 


linear equations is called Gauss's method or the method of successive elimination. The 
method is very convenient for small n, and also for computer solution in the case of large 
n (although for a variety of reasons it is often more practical to use other methods, for 
example, iteration methods). This method is especially useful when the coefficients are 
fixed, and we are looking for a solution with a specified degree of accuracy. However, in 
theoretical investigations, it is often of greater importance to find compatibility or 
determinacy conditions for a linear system and also to find general formulas for the solutions 
in terms of the coefficients and the constant terms -- without reducing the system to step 


form. To some extent Corollary 1’ is of this type (i.e. , not requiring reduction to step 
form). 
Example 1. We again return to the heated plate problem of §2. As we saw in the 


first subsection of §3, the question that interests us can be stated in terms of the properties 


of a certain very concrete linear system (which we denote the HP system), which has a 


28 


rather large number of unknowns t . Following the criterion in Corollary 1', we consider 
the homogeneous linear system HHP associatedto HP. In other words, we now take the 
temperature of all boundary vertices to be identically zero. Let e be the index of an 


interior vertex having maximal value le. . Then the condition 


— ea her aes 
ee 4 
implies that le | = le, | = lel = le. | = [r! . Moving one vertex at a time in each of the 
four directions, we similarly find that each vertex we pass through has le, | = Ir, 0 


Eventually we reach a boundary vertex, having temperature zero. Hence c = 0, and so 
t = 0 forall i. Thus, the system HHP has only the zero solution, and so the system 
HP is compatible and determined. This solves the heated plate problem: there is one and 


only one possible distribution of temperatures. 


Example 2. Consider the linear system 


xX) . ° 5 = al, 

Xy + : : ei. 

“Xp ox +X, : woe = 0 , 
Poe es Xie] +x, = 0 


This is obviously a compatible and determined system, which already has a step (triangular) 
form, except that it must be solved from top to bottom, instead of from bottom to top as 
with (5). By definition, its solution is the first mn numbers in the sequence of Fibonacci 
numbers fy ; f, potty i . These numbers are connected with a certain botanical 
phenomenon, called phyllotaxis (the arrangement of leaves on a stem). It would be nice to 
have an expression (an analytic formula) for the n-th Fibonacci number when n = 1000 , 
or even for arbitrary n. You might object that, with patience, even fy 000 can be 


computed using the inductive definition of these numbers. But this is not what we mean, In 


Chapters 2 and 3 we shall give two expressions for i (although, in the case of this 


29 


specific problem we could proceed more directly, without waiting for the general techniques). 


Remark. It is sometimes more convenient to find a solution to a linear system with- 
out reducing it to step form. This is especially the case when the matrix of the system 
contains many zeros, Here some practice in doing this is more useful than reading lengthy 


explanations. 


$84, Determinants of small order 


When presenting Gauss's method, we did not much care about the values of the 
coefficients of the principal variables. It was only important for these coefficients to be 
non-zero. We now do a more careful job of eliminating unknowns, at least in the case of 
square linear systems of small size’ This will give us some food for thought, and a starting 
point for constructing a more general theory of determinants in Chapter 3. 


Asin §3, we consider a system of two equations in two unknowns 
(1) 


and we try to find general formulas for the components x,° : x,° of its solution, By the 


determinant of the matrix 


ool 0 22 


i - + we denote the determinant as follows: 
we mean the expression 411499 514493 we 


a a 
pe mae . Toevery square 2 x 2 matrix we thereby associate a number 
a,, a 
ll ed 

a a 

Wak ke 

= fo.@a, — Bact (2) 
W222) Zee 
Sai. a2 


If we try to eliminate Xy from the system (1) by multiplying the first equation by Any 


30 


and adding it to (-a)5) times the second equation, we obtain 
a a 
ll §12 
x, = ba = Noy. a : 
ao) ano 1 1°22 2 12 
Pi tay 
The right side is nothing other than the determinant of the matrix pen . We suppose 
2, ieee 
a,,a 
that a ie # 0. We then have 
PAL 2222 


= = 3 
x) Xx, (3) 
Once we have formulas for finding the solutions of a system of two equations with two 
unknowns, we can also solve certain other systems. For example, consider a system of 
two homogeneous equations with three unknowns: 
Aiea Sree rec! De & 


tal 
I 
(S 


We are interested in finding a non-zero solution of this system, i.e., a solution for which at 
least one x #0. Suppose, for example, that X, # 0. Dividing both sides of the two 
equations by ~Xq and setting es ~X)/X3 and ees ~X,/Xo » Wwe rewrite (4) in the 


same form as (1): 


ap fl 


el ee = nee 


+a 


SOAS 9979 = 8253 


Crip 
If we assume that # 0, thenthe formulas (3) give 


Syl a 


31 


It is not surprising that, starting with (4), we determined not x xX, themselves but 


1s 


rather their ratios. We immediately see from the homogeneity of the system that, if 
6 a ous : ; 
(x, Xo°y Xe ) isa solution and c is any number, then (c xo; cx,°, Cx,°) is alsoa 


solution. Thus, we can set 


We ee) eine ie eh Ww 
SHG a z oe) ve ae a a ae a (5) 
95799 PA ol ae, 


and say that any solution is obtained from this solution by multiplying all of the x, by some 
i 


number c. Wecan give these formulas a more symmetric appearance if we note that 


always 


a of — . fie 
je al =~ la el 


as is clear from (2). Hence, (5)° can be written in the form 


1 ie Sit 13 ei ie) 
Aes ahaeets aes” a ae ae (6) 
DEP Oe Ee 08 ee 


a a 
[ee oe Baars 


These formulas were derived under the assumption that 
ee 


not hard to see that, as long as at least one of the determinants in (6) is non-zero, it is 
still true that the solutions of (4) are precisely the multiples of the triple in (6). However, 
if all three determinants are zero, then, while (6) still gives a solution of (4) (the zero 
solution), it is no longer the case that all solutions can be obtained from (6) by multiplying 


the three determinants by some number. For example, consider the system consisting of 


two identical equations x + Xo + X_ = 0. 


We now proceed to the case of a system of three equations with three unknowns: 


ey be ees 


i 
Ay 1X1 + AgoX%y + Agg%3 = Py > 
3 


Dae Pe cee 


We would like to eliminate Xo and Xe from this system, in order to obtain a value for 


32 


X To do this we multiply the first equation by Cis the second by cy , and the third 


by c and add them. We choose c in such a way that the resulting equation 


Bo eos a 


has zero coefficient of Xy and Xs Setting these two coefficients equal to zero gives the 


following system of equations for the unknowns cy Coy Cy 5 


ST ate Aooo te Ae oCe 


Sey oR cee 


= 0 
= 0 a 


ae 


These equations are of the same type as (4). Hence we can take 


yy op in 82 i oe 
aie a ee 8 a5 fl ae oe Zan ° 

Ba 34 13 733 13 723 

After using these values of Cy» Coy Ca to combine the three equations, we obtain the 


following equation for xy 


le) 2 ee aa 2 ee 
all Ano an, 21 ano an. 31 Boo aoe i 
(7) 
“32723 iy is ives 
= ee oe by + by i 
32 333 Bae ee #99295 
eT 12 1g 
The coefficient of x) in (7) is called the determinant of the matrix ao) ano Bog and 
ee =) a Be 
Mh “ig “ne 
is denoted any Ayo Ao. 5 
Gol a7 3a 
Thus, we take the third order determinant to be the expression 
a a a 
1 “12 “4 
- ae 3 ae #22 %23] | A "12 6 - Dalene @ 
Bi ~ + 
hat 2 ee ll Flos, Cue 21a, Ban 31la,, ae 
Sil “Bo “se! 


~ 911799%33 + 81989393; + 24345)295 - TT OBE ae” ee Fee 
Peres ee 


which we have defined using second order determinants. Now notice that the right side in 


33 


7 : ear : 
(7) can be obtained from the coefficient of eu the left by replacing a by by a 


ta SN) 
by by » and ag) by b, . Hence, equation (7) can be written in the form 
eae 19. 13 2) Shp “ns 
ge el Sas Ep) sere | a 
eR ae ee bs 39 993 


Suppose that the coefficient of x here is non-zero. Then, if we carry out 


analogous computations for Xy and %z we arrive at the formulas 


(9) 


Clearly, the same reasoning can be applied to a system of four, five, and so on 
equations in an equal number of unknowns, To treat the case of four equations, we must 
‘first derive formulas similar to (6) for the solutions of a homogeneous system of three 
equations with four unknowns; then we eliminate Koy Xgy Xy in the system of four equations 
with four unknowns by multiplying the equations by Cys Cor Sgr Sy and adding them. We 


find the values of the Cc, (i = 1, 2, 3, 4) by solving a system of three homogeneous 


equations, 


By analogy with (8), we define the fourth order determinant to be the coefficient of 


x) in the resulting equation; it will be built up from third order determinants. Carrying 


out the same procedure for Xr Xgo Xy» WE find formulas analogous to (9) for the x, 


We can continue in this way indefinitely. We can be sure that we will eventually be 


able to solve systems of n equations with n unknowns for any n, because ofa 


principle that is widely used in mathematics: the principle of mathematical induction 


(see §7). 


34 


EXERCISES 


1, Formula (8) can be easily remembered if one uses a visual device which gives 
the rule for the sign of the products which occur in the third order determinant (see Fig. 4). 


Find a similar visual rule for the sign in the fourth order determinant, 


e e 
\S Us. 7 Yow \ a 
Se as Vv “A, 
BES > NEN 
WAY tee Oss Bee Ss 
Bers oS \ 
Gm Soll SK Va 4 ~ 
+ = 
Fig. 4 


2, Show that it is impossible for all six terms in the expansion of the third order 


determinant to be simultaneously positive. 
3. The square of the area of the parallelogram which is constructed using the vectors 


from the origin to the points P,Q with rectangular coordinates (a, 8) and (y, 6) (see 


Fig. 5), is given by the formula 


a4 B7 ay + Bd 


ay + Bd eae 


Fig. 5 


(This is easy to see, if one changes to a coordinate system in which P liesonthe x-axis; 
one must check that the determinant on the right does not change under a change of 
coordinates, ) Find a similar expression for the square of the volume of a parallelopiped in 


three -dimensional space; use a third order determinant. 


35 


85. Sets and mappings 


In the preceding two sections we have encountered various sorts of sets of elements, 
and various sorts of mappings between sets. The set of solutions of a given system of linear 
equations, the rule which associates to every 2x2 matrix its determinant -- these are 
only special cases of certain formal notions with which it is important to become familiar, 


at least on an intuitive level, as soon as possible, 


1. Sets. By a set we mean a collection of objects, which are called the elements of 
the set. A set with finitely many elements can be described by explicitly enumerating all of 
its elements; these elements are usually enclosed in braces. For example {1,2,4, 8} is 
the set of powers of two between 1 and 10. Asa rule, a set is denoted by a capital 
letter in some alphabet, and an element in a set is denoted by a small letter in the same or 
another alphabet. Certain pete for some of the most important sets have become 
standard, and should be consistently used. Thus, the letters WN, Z, Q, R_ denote the set 
of positive integers (natural numbers), the set of all integers, the set of rational numbers, 
and the set of real numbers, respectively. Fora given set S, the symbol aeS means 
that a isanelementofthe set S; if a isnotanelementof S$, wewrite a d S. We 
saythat S isa subsetofaset T orwrite Sc T (S_ is contained in T), if we have 
the implication 

Vee Meee Ge 1, 
(Concerning this notation, see "Advice to the Reader" at the beginning of the book.) Two sets 
S and Tare said to coincide (to be equal) if they have the same elements. Symbolically: 
Ge Ss Se WP he eS : 
(<==> means "if and only if", i.e., "two-way implication". ) By definition, the empty set 
@, which is the set without any elements, is a subset of every set. If SC T, but 


S#q@ and S #T, then S_ is called a proper subset of T. Subsets SC T are often 


defined by giving a property which only elements of S possess, For example, 


{ne Z|n = 2m for some me Z} 


36 


is the set of all even integers, and 
N = {ne Z{n > 0} 
is the set of natural numbers. 
By the intersection of two sets S and T we mean the set 

Sm = dxiee S and’ e Th | 
and by their union we mean the set 

SUT = {xlxe S or xe T} 
The intersection ST might be the empty set. In that case we say that S and T 
are disjoint sets. The operations of intersection and union satisfy the identities 


RN(SUT) 


OR iS) UW GRid a) 


RU(SN T) 


iT} 


OR US) ol CRU IG) 


the verification of which we leave to the reader as an exercise, The diagrams 


By the difference S\T of the sets S and T we mean the set of all elements of 
S which are not elements of T. Here we do not require TCS. Thenotation S-T 
is sometimes used instead of S\T. 

If T isa subset of S, then the difference S\T is also called the complement 
of T in S. Ifweset R=S\T, thenwehave: RNT=4, RUT=S. Notice that 
there is a correspondence between the operations of intersection, union, and complement, 
and the logical connectives "and", "or", "not". 

Now let X and Y be arbitrary sets. A pair of elements (x,y), where xeX 


and yeéY, which is taken ina definite order, is called an ordered pair. We consider 


two ordered pairs (Kv) and (X5,¥5) to be equal if and only if x,=x, and ie 


1 2 B 


The cartesian product of two sets X and Y is the set of all ordered pairs (x,y): 


XX Y = 4) eey ey ae 


37 


For example, let IR be the set of all real numbers. Then the cartesian product 
IR? = IRx R_ is simply the set of all of the cartesian coordinates of the points on the plane 
relative to a fixed choice of coordinate axes. 

In a similar way we can introduce the cartesian product x) x X, x x, of three sets 
(this is (x, x X.) Xx X, » or equivalently xX, Xx (X, X X,))5 the cartesian product of four 


sets, and soon. If X,=X Saba = OK we abbreviate KIN x Bex and 


2 ke 


? 


call this the k-th cartesian power of the set X. The elements of xe are sequences 


(rows) of length k: (x Hod Se) 5 


ges ie 
In order to get a feeling for the difference between the sets Xx Y and XKUY, 


we take the case when X and Y are sets with finitely many elements (of finite 


cardinality; the number of elements in a set is called its “cardinality” andis denoted Card 


Ix| = CardX =n, [¥| = Cardy = m 
Then 
Ixxy| = nom, while [(KUY|=n+m-([XnyY| . 
If these equalities are not immediately clear, the reader should carefully reread all of the 


definitions. 


2. Mappings. The notion of a function or mapping (also: "map") plays a central 
role in mathematics. Given two sets X and Y, amapping f with domain of definition 
X and range of values Y associates to every element xé€ X anelement f(x)eY, 
which can also be denoted fx. Inthecase Y=X wealsocall f a transformation of 
the set X to itself. A mapping is written symbolically inthe form f:X ~ Y or 
x = Y. Theimage of a mapping f is the set of all elements of the form f(x): 

Imf 41x e kXi=t(jc vy. 
The set 
fy) = {xe X|t%) = y} 


is called the preimage of the element ye Y. More generally, for Yo CY we set 


38 


£'(Y,) = {x € X|f(x) € Yo} = Pe ON we 
yeYy 


If y ¢ Y\lmf, then obviously £ ly) =. 

A mapping f:X- Y is called a surjective or an onto mapping, if Im if S Yeo ite 
is called an injective mapping if x#x' implies f(x) # f(x'). Finally, f: AW iG 
called a bijective mapping or a one-to-one correspondence if it is both surjective and injective. 

To say that two mappings f and g are equal means that their domains and ranges 
are the same: X aa Wg ox fay ; and tat “x)= e(x) 7 Ye The symbol x > 1x) 
denotes the correspondence of a value f(x) « Y tothe "argument" x, i.¢., to an 
element x¢ X. 

To take an example, let i be the n-th Fibonacci number (see §4). The 
correspondence nh fo gives a mapping N- N. The mapping is obviously not 

2 


surjective, and is also not injective, since fy =f,=1., Another example: if = is the 


set of non-negative real numbers, then the mappings f:R~7~R,g:R-RK, ,h: R,> R, 


+ 3 
defined by the same rule x e , areall different mappings. Here f is neither sur- 
jective nor injective; g is surjective but not injective; and h is bijective. Thus, the 
specification of the domain of definition and range of values is an essential part of defining a 
mapping (function). 

The identity mapping ey :X —~ X is the mapping which takes every element xeX 
to itself. 1f X isasubsetof Y:XC Y, it is sometimes useful to consider the 
inclusion mapping 1: X ~ Y, which takes every element xe€ xX _ tothe same element, 
now regarded as an element of Y. A mapping f:X- Y is called a restriction of the 
map g:X'-Y' if XC X', YC Y', and f(x) = g(x), VxeX. Inthis situation g 
is called an extension of f. For example, the inclusion I: X- Y isa restriction of the 
identity mapping ey EVE So 


We shall also have occasion to speak of functions of several variables. lt is worth- 


while to convince oneself that, if we use the notion of a cartesian power X" ofa set X 


39 


(see above), we can then treat a function f(x), tee, x) of several variables % € xX, 
i= 1,---,n, as an ordinary function f:; aan Y of one variable x e€ x 
The product (composition) of two mappings g:U-~ V and £{:V—4W _ isthe 
mapping fog:U- W_ which is defined by 
(fog)(u) = f(g(u)) , Vue U 


This definition can be depicted visually by means of the triangular diagram 


We say that this diagram “commutes” (or "is commutative"), i.e., the result of going from 
U to W _ does not depend on whether we go directly using fog orvia V, using f 
and g. Note that the composition is not defined for just any mappings f and g. Inthe 
above notation, it is necessary that the same set V_ be both the range of g andthe 
domain of f£. The composition of two mappings froma set X_ to itself always makes 
sense, 

We shall henceforth write simply fg insteadof fog. 


An obvious verification shows that for any mapping f£:X —~ Y we have 


An important property of the composition of mappings is given in the following 


THEOREM l. Composition obeys the associative law. This means that, if 
h:U7V,g:V-~7W, and f:wWr-T are three mappings, then 


f(gh) = (fg)h 


Proof. The necessary argument is expressed in the following diagram: 


Uy~ at 


40 


where a= gh, f= fg. According to the definition of equality of mappings, we need only 
compare the values of the mappings f(gh):U-~ T and (fg)h:U-~ T at an arbitrary 
element ue U. But, by definition of composition, we have 
(f(gh))u = f((gh)u) = f(g(hu)) = (fg)(au) = ((fg)hju oO 

In general, the composition of mappings X —- X is not commutative, i.e., 
fg # gf. One can see right away from the example of a two-element set X = {a,b} with 
f(a) = b, f(b) = a, g(a) =a, g(b) =a. Anotherexample: let f and g be constant 
mappings from X to X, i.e., the values f(x) and g(x) donot dependon x. Then 
f#e=> fg# st. 

Some functions have inverses. Suppose that f:X-~ Y and g:Y- X are any 
two mappings; then the compositions fg and gf aredefined. If fg = ey 5 (deem i 
is called a left inverse of g, and g iscalled aright inverse of f. If the product in 


either order is the identity map: 
He SE Ge (1) 


then we call g the two-sided inverse (or, simply, the inverse) of f (in which case f 
is the inverse of g), and wedenote g by the symbol a 5 shane, 


f(u) = v <=> Fate) = Ul 5 


If there were another mapping g’: Y ~ X_ for which 

foes ey 3 er S ey» (1’) 
we could conclude, using (1), (1') , and Theorem 1, that 

g =e,gs' = (gfe = ae(fg)=ge,=e . 


Thus, whenever a two-sided inverse of f exists, it is unique. Thus, the notation ft is 


unambiguous. 


THEOREM 2. A mapping f:X -—~ Y has an inverse if and only if it is bijective. 


The proof of the theorem is based on the following lemma, which is useful in its own 


right. 


41 


LEMMA. If f£:X-= Y and g:Y-~ X are any mappings for which gf= ey» 
then f isinjective and g is surjective. 

To prove the lemma, first suppose that x ,» x €X and f(x) = f(x'). Then 
x = ey (x) = (gf)x = g(fx) = g(fx') = (gf)x' = ey (x') = x'. Thus, f is injective. Next, 
if x is any element of X, then x = e,(x) = (gf)x = g(fx), and this proves that g 
is surjective. 

Returning to Theorem 2, first suppose that f has an inverse g= f - Then the 
equations (1) and the lemma give both injectivity and surjectivity of f. In other words, 
f is bijective. Conversely, if we suppose that f is bijective, forany ye Y wecan 
find a unique element xe X for which f(x) = y. Setting g(y) = x, we define a mapping 


g:Y- X having the properties (1). Thus, f woes o 


COROLLARY. If f£:X- Y is bijective, then fie Zipe bijective, and 


tae inerene ee (2) 


Further, suppose that f:X-— Y and h:Y-—Z are bijective mappings. Then the 


composition hf is also bijective, and 
Go =5 ne) ee (3) 
Proof. By Theorem 2, the bijectivity of f implies the existence of f!. Then 
symmetry of the conditions in (1) , written in the form oe =ey; fi ¢ =ey, shows that 


f is the inverse of fa , Which must then be bijective, by Theorem 2. Next, by 


: ed eS 
Theorem 2 and the hypothesis, we have inverse mappings f :Y~X,h :2Z27 Y and 


their composition an i :Z-~ X. The equations 
-1,- leat +1221 -1 
cag tn”) = (at yh”? = (ff) = bh =e, , 
-1,- -1,-1 -1,,-1 -1 
tn yap = £1 ne) = f(b) = fF = ey 


imply that f : h : is the inverse of hf. 0 


The mapping o: N- WN defined by o(n) = n+1 is injective but not surjective, 


42 


since 1 doesnot belongto Imo. It is interesting that this cannot happen with finite sets. 


THEOREM 3. If X_ isa finite set_and the mapping f:X-— X is injective, then it 
is bijective. 
Proof. We need only show that f is surjective, i.e., for any element xe X we 
must find x' with f(x')=x. Set 
k =] 
f (x) = f(f--- (fx) ---) = e(E poe es els Hos 
Since X_ is finite, there must be repetitions in this sequence of elements, say, 
f(x) = f(x) », m>n. If n> 0, then, because ee) = fe 3 and f is 
Beet m-1 n-1 é eee: ; 
injective, we must have f (x) =f (x). Canceling f inthis way n times, we 


-n-1 
obtain anelement x' = a o (x) with the required property: f(x’) = x. fe 


It is similarly easy to see that a surjective mapping of a finite set to itself must be 


bijective. 


A few words on cardinality. We say that two sets X and Y have the same 
cardinality if and only if there exists a bijective mapping f:X-Y. Sets with the same 


cardinality as IN (or 2%) are called countable. 


EXERCISES 


lL Let @ = {+,-,4++,+-, -+, --,+++,-°°-} be the set of all finite sequences 


of pluses and minuses, andlet f:9Q 7 Q_ be the mapping which takes an element 


W = Wy) Wo *°° wi EQ tothe element w = WW) WoW WW,» where Wy. = 7 if 
a= and Wy =+ if Gna = Show that any interval of length > 4 in f(fw) 
contains ++ or --. 


- : Z 
2. Does the mapping f: N-~= N givenby ntn_ havea right inverse? Find 


two mappings which are both left inverses for f. 


3. Let £:X- Y beamapping, andlet S and T be subsets of X. Show 


that 


43 


(SUT) = f{(S)Uf(T) , fSNT< f(S)n eT). 
Give an example showing that the second inclusion cannot, in general, be replaced by 


equality. 


4.  Letthe symbol P(S) = Mee < S$} denote the set of all subsets of S$. For 
example, if S = {s), oar st is a finite set with n elements, then (S) consists 
of the empty set @, the n one-element sets {s,}, {s,}, tee, {s_} , the n(n-1)/2 
two-element sets {s,, st » |< i<j<n, andsoon, until we reach T=S. What 
is the cardinality of the set P(S) ? 

5. Let £:X- Y be amapping, andlet b = f(a) for some aeX. The pre- 
image 

f "(b) = f '(la)) = {x]H(x) = Ha)} 
is sometimes called the fibre over Fe element be Imf. Showthat the set X isa 
disjoint union of fibres, i.e., that the fibres give a partition of X. (WARNING: the 
symbol £  (p) should not be thought of as referring to an inverse mapping, since an 
inverse may not exist. ) 


6. Show that a finite cartesian power of a countable set is itself a countable set. 


7. The symbol S A T denotes the symmetric difference of the two sets S and 


SoA = "(SNE) UNS) 


Show that 


SATS GUMVGin TT 


$6. Equivalence relations. Quotient maps 


The idea of equivalence of systems of linear equations, which we introduced in §3, 


44 


leads to the thought of introducing such a concept in our general setting, especially since 
various types of equivalence are used, often unconsciously, both in logical reasoning and in 


daily life. 


1, Binary relations. Given two sets emg We, 
any subset OC XXY_ is called a binary relation between 
X and Y (orsimply a binary relationon X, if Y= X). 
If an ordered pair (x,y) is anelement of O, we use the 


notation xOy and saythat x hasthe relation O to y. 


This notation is useful, since, for example, the ordering 


Wale (6) 


te" on the set of realnumbers JR is the binary relation 
on IR consisting of all points of the plane R? which lie above the line y =x (see 
Fig. 6); in this case the cumbersome notation 
(x,y) ¢ O (O= <) 
can be replaced by the usual inequality x <y. 
To every function f:X~ Y we associate its graph, which is the subset 


r(f) = {(@,y[xeX,y = fw}OXxY . 


The graph [(f) isa binary relation between X and Y. ‘The graphs in R? of 
functions IR ~ IR are studied in calculus courses, It is clear that not every binary 
relation O canbe the graph of a function. A binary relation O is the graph of some 
function from X to Y ifand only iffor every xe X thereis exactly one y with 
xOy. 


Specifying X, Y, andthe graph [(f) is enough to reconstruct the function f. 


2. Equivalence relations, A binary relation ~ on X_ is called an equivalence 
relation if the following conditions hold for all x, x', x" € X: 
(i) x ~ x (reflexivity) ; 
(ii) x ~ x' => x' ~ x (symmetry); 


(iii) x ~ x’ and x' ~ x" => x -~ x" (transitivity). 


45 


The notation a + b means that the elements a,b e X are not equivalent, 
The subset 
Meee Xi = aha x 
of all elements equivalent to a given x_ is called the equivalence class containing x. 


Since x ~ x by (i), wedohave xe x. Anyelement x'e¢xX iscalleda representative 


of the class xX. We have the following fact: 


Definition. If aset X isa disjoint union of subsets X,, we saythat {X.} isa 
—— i i 
partition of X. 


PROPOSITION. The set of ~ -equivalence classes is a partition of the set X. 


(This partition can be denoted by the symbol #m_(X) .) 


Proof. Forany xe«X wehave xex; hence, X= U x. Next, aclass x 
xEX 


is uniquely determined by any representative init, i.e., X= x' <=> x ~ x’. Inone 


‘ — —; 


direction: x ~ x' and x" ¢ xX => x"~ x => x" ~ x' => x" ex’ => xC x'. But 
x ~ x' => x' ~ x by (ii); hence, the reverse inclusion also holds: x'C x. Thus, 
x = x'. Inthe other direction: since xeX, wehave x'=x=>xex'=>x~ x’, 
Now suppose x Mx" 2g. I xe x’ x”, thea x= % “and x — x5 %sD 
fiat, by (ii) and (iti), wehave x ~ x", hence x’ = x". Thus, distinct classes are 
disjoint. 0 
Let T] = R? be the real plane with rectangular coordinates. Ifwetake ~ between 
two points P,P’ « I] tomeanthat P and P’ lie on the same horizontal line, we 
obviously have an equivalence relation whose equivalence classes are the horizontal lines 
(Fig. 7). Similarly, the hyperbolas Le (Fig. 8) of the form xy =p, where p>O, 


determine an equivalence relation in the region M1, CT of points P(x,y) with coordinates 


rae Oey Ore ihese 


46 


Fig. 8 


geometrical examples visually illustrate the following assertion. 


If m(X) isa partition ofa set X_ into disjoint subsets Cc, 5 (doveial idee a are 


equivalence classes for some equivalence relation ~ on X. 


Proof. By assumption, each element xe X_ is contained in precisely one subset 
C . Wedefine ~ by saying that x ~ x’ ifandonlyif x and x’ lie inthe same 
a 
C . This relation ~ is obviously reflexive, symmetric, and transitive, i.e., it is an 
a : 


equivalence relation, Furthermore, x € Cc, == = Cc, , sothat m_(X) = m(X). Se) 


3. Quotient maps. We just saw that there is a one-to-one correspondence between 
equivalence relations and partitions of a set X. It is customary to let the symbol X/~ 
denote the partition of X corresponding to ~ ; this set of equivalence classes is called 
the quotient set of X relativeto ~. The surjective mapping 

p:x 6 p(x) = x (1) 
is called the natural mapping (or the canonical projection) of X_ onto the quotient set 
X/~. Inthe example in Fig. 7, X/~ isthe set of horizontal lines, and the canonical 
projection is the mapping that associates to each point in R? the horizontal line through it. 

Let X and Y betwo sets, andlet f:X- Y bea mapping. The binary 
relation O; : 


xO x’ <== Sieg) SSG) 4, WK Ke SO 


is clearly reflexive (f(x) = f(x)), symmetric (f(x') = f(x) => f(x) = f(x')), and 
transitive (f(x) = f(x') and f(x') = f(x") => f(x) = f(x")). Thus, O- is an equivalence 


relation on X. The equivalence classes xX are the fibres (preimages) in the sense of 


Exercise 5 of §5. In other words, 


47 


x = {x'|A(x') = £00} 
The mapping f:X-— Y inducesa mapping he X/O, ~ Y given by the rule 


f(x) = f(x) , (2) 
or, if we use our notation for the canonical projection (see (1)) 


GEE) = ay (2') 


Since x = x’ <=> f(x) = f(x'), it follows that the equality (2) defining f does not 
depend on the choice of representative x of the equivalence class x. (In this case, itis 
customary to say that iis well-defined, or that the definition (2) is correct.) In other 
words, by definition f has a fixed value onall x inan equivalence class; hence we can 
think in terms of a function f whose domain is a set of equivalence classes, For example, 
Hite ft R? ~ IR is defined by f(x,y) . ee , then we get the equivalence relation in Fig. 7, 
and f (theline y = Yo) = yp 5 


The commutative diagram 


depicts the factoring (decomposition) 
ae (3) 


of f into a product of a surjective mapping p and aninjective mapping f. Notice that 


f is injective because 

£(,) = i(x,) <==> i(x,) = f(x) <=> x, = x, 
The mapping f is surjective if and only if f is surjective. Note that, if f': x/0, = ie 
is another mapping for which f'p =f, then since f'(x) = f'(px) = (f'p)x = f(x) = £(%) 
(by (2)), it follows that in fact f' = a Thus, the mapping { which makes the above 


triangular diagram commute is unique. 


48 


4, Ordered sets. By an ordering ona set X we meana binary relation < on 
X which has the properties of reflexivity (x <x), anti-symmetry(x<y and y <x ==> 
x=y), and transitivity (x<y and y<z2 ==> x<z). If x<y-and x#y, we write 
x<y. If x<y wealsowrite y>x. It is possible for a pair of elements x, x eX 
not to be related in either direction by <. Butifour X and < are such that either 
x<x' or x'<x forevery pair ofelementsin X, then X_ is said to be linearly 
ordered (or totally ordered). In the general case we speak of a partial ordering on X. 

Some examples of partially ordered sets are: the set X = P(S) of subsets of a set 
S (see Exercise 4 of §5) with < _ being the usual inclusion relation Rc T between sub- 
sets, andthe set IN of natural numbers with < being the relation d In (n_ is divisible 
by d). 

Let X be an arbitrary partially ordered set, and let x and y _ be elements of 
X. We saythat y follows (or covers) x if x<y and there does not exist z with 
x<z<y. [f X isa finite set, then x<y _ if and only if there is a chain of elements 
Kp = Xs Xqeett ss X ye KTV in which ret follows Xs. The notion of following 
(covering) is useful for depicting a finite partially ordered set X by aplane diagram. The 
elements of X are represented by points. If y covers x, then y _ is placed higher 
than x and x isjoinedto y bya straight line. If y and x arerelatedby <, 
they are joined by a "descending" polygonal line, and perhaps by several such lines. The 
first diagram in Fig. 9 depicts an interval of the natural numbers with the usual ordering 


< (under which INN is linearly ordered); the second diagram shows ®({a,b,c}) with 


the inclusion ordering described above. 


5 {a,b,c} 
i 
(aby 
3 
2 {a} 


many 


49 


A greatest element of a partially ordered set X is anelement ne xX such that 
x<n forall xe€X, anda maximal element is an element meX such that m<xexX 
implies that x=m. A greatest element is always maximal, but the converse is not true. 
There can be many maximal elements, but the greatest element, if it exists, is unique. 
There are analogous definitions and remarks for least and minimal elements, The first two 
diagrams in Fig, 9 each have a greatest and a least element, In the third diagram there 
are three maximal elements and one least element, but no greatest element. 

The theory of partially ordered algebraic systems (Boolean algebras, lattices) is full 
of interesting results and occupies an important place in algebra, but we cannot go into it 
here. This section has the modest goal of acquainting the reader with a further example of 
a binary relation and giving some exposure to diagrams of the type that will later be of use 
in understanding, for example, the location and interrelation of subgroups in a group or sub- 


fields in a field. 
EXERCISES 


1, Give a one-to-one correspondence between the quotient set Rope obtained from 
the picture in Fig, 7 and the points of an arbitrary straight line 4% which intersects the 
X-axis. 

J LCta eX. y) a Poa. y )) foripoints of R? if and only if both x'-xe Z and 
y'-ye @Z. Provethat ~ is an equivalence relation, and that the quotient set os is 


in one-to-one correspondence with the points on a torus (surface of a donut, see Fig. 10). 


Fig. 10 


3. Show that sects of two, three, and four clements have, respectively, 


50 


15 different quotient sets (i.e., that there are 2, 5, 15 different equivalence relations 


that can be defined on them). 


4, Let ~ bean equivalence relation ona set X, andlet f:X-~ Y bea 
mapping for which x ~ x' => f(x) = f(x'). Show that this compatibility condition of f 
with ~ allows us to construct a well-defined mapping —:X f(x) from X/~ to Y 
which gives a factoring of f:f = fp . However, note that f{ is no longer necessarily 


injective. What condition must be satisfied in order for { tobe injective? 


5. Draw diagrams for the following partially ordered sets: (1) ®({a,b,c,d}), 


(2) the set of all divisors of 24 (use the ordering by divisibility: x < y <=> x ly) : 
$87, <The principle of mathematical induction 


The set N= {1,2,3,+-+} ofall natural numbers (positive integers) is considered 
to be a very familiar set. Actually, the point of departure for studying WN is the axioms 
of Peano (1858-1932). His three axioms (which we shall not list here) can be used to derive 
the properties of addition and multiplication and the linear ordering (see above) of the natural 
numbers (more precisely, of the non-negative integers NU{0}). Those axioms imply the 
following intuitively clear assertion: every non-empty set 5C N_ has a least element, 
i.e., there is a naturalnumber seS _ which is smaller than all of the other numbers in 
S. Using this assertion, one can deduce the following 

Principle of Induction. Suppose that for every ne IN we are interested in an 
assertion M(n). Further suppose that we know that M(1) is true, and we have a 
procedure which tells us that M(Z) is true whenever M(k) istrue forall k < 4. Then 
M(n) is true forall ne N. 

To prove this from the "least element principle” in the previous paragraph, let 

S = {se IN|M(s) isfalsehc N , 


Suppose S isnon-empty. Then S has a least element s This means that Msp) 


0° 


is false, but M(s) istrue forall s< So ° But this contradicts our assumption, which 


ot 


assures us that we know M(s,) to be true whenever M(s) istrue for s < gs o 


0° 

This is not the place for a philosophical discussion of the principle of induction. 
Suffice it to say that, in some sense, it reflects the essence of what is meant by a natural 
sequence, 

Notice that, when using the principle of induction, it is essential to establish a starting 
point for the induction, i.e. , to verify that the assertion holds for a certain small n. If 
this step is neglected, one can "prove" such ludicrous assertions as “all students have the 
same height". Here is the argument: The empty set of students and the set of one student 
have this property. Proceeding by induction, we suppose that any set of <n _ students has 
the property. Then, ina setof n+1 students, the first n andthelast n_ students 
have the same height by the induction assumption. These sets intersect ina setof no-1 
students, all with the same height. Hence, all n+1 students have the same height. The 
fallacy here is that the first use of the induction step relates to the set of any two students, 
and it is there that the induction step is unjustified. For how many low values of n must 
we verify an assertion before we can be sure that the induction step is valid? Usually this is 
clear from the proof. In our example the assumption implicit in the induction step is that 
the two sets of size n inthe setof n+1 elements must have non-empty intersection; 
this means that n>2. 

In more complicated situations, especially when one defines or constructs an object 
by induction using recursion relations (as we shall do for the determinant of a matrix in 
Chapter 3), one must pay careful attention to establishing a basis, or starting point, for the 
proof or definition by induction. On the other hand, one must not go to the other extreme of 


erroneously concluding M(n) for all ne N onthe basis of a case-by-case verification 


of M(k) forall k inavery long sequence 1<k<42. Here are two unpleasant examples 
of what that can lead to. 


1. Fermat conjectured that all numbers of the form FO a2 ai, tf &@, iljoce . 


(the so-called "Fermat numbers") are primes, (Concerning prime numbers, see 88.) The 


52 


first five Fermat numbers are in fact prime, but Euler found that F. is composite: 


F. = 4294967297 = 641 + 6700417. Persistent attempts to find at least one more prime 


Fermat number using the latest computers have not yet met with success. One of the most 
recent "accomplishments" in this area was the verification that F i945 is divisible by 


; 31947 a 


5 1. 


2. If one looks at numbers of the form a -n+41 (a polynomial studied by 
Euler) for n= 1,2,++-, 40, one might think that this polynomial takes prime values for 


all no. However ae -414+41 = Ae . 


Sometimes the most important part of proving a formula by induction is having the 
right form for the formula one is trying to prove. For example, suppose that we want to find 


the sum 


k k 
Pia) = 1 ap 2 Eee oat I) eae koa 2 


The problem becomes much easier when we are told that the answer is supposed to be the 
following expressions: 


n(n + 1) (2n +1) 


py) = So pa(ay = SEDO 
2 
_ | a(a + 1) 
Pa(n) = D ° 


Although p,(a) is not hard to think of (Gauss did this as a young boy), the form of P,(a) 


and p,(n) is not quite so trivial, and the relation 
3 ? 


4 
p(n) + p,(a) = paar n| 


could probably only be thought of if one had developed a framework or theory to predict these 
expressions. In this case such general procedures can be found, but that is not our concern 
here. Once we know the formula we want to prove, all we have to do to prove it is make the 
trivial computation for n=1, and then, also by direct computation, verify the induction 


step from n to n+1. It would be a worthwhile exercise for the reader to carry out this 


53 


procedure for the formulas given above. 


To do the above exercise one uses the so-called binomial formula 


a o 

n n fala = 

(a+b) =a +(‘)s pie) a coe gal . (1) 
1 k 

Here a and b are arbitrary numbers, and the binomial coefficient (?) of the 


-k 
monomial a ie has the form 


n\ _ n! _nin- 1)... (9 -k+ 1) 
ie rceris Geo Dann De il ; 2) 


It is useful to adopt the convention that 0! = 1 and (| = 0 for k<0O. Note that 


(a= 4) = (e 
eek k 
(the symmetry property for binomial coefficients), 

We prove (1) by induction on no. The formula is obviously true for n= 1,2. 
We assume it holds for all exponents <n, and then multiply both sides of (1) by a+b. 
We obtain 


este =o a8 ase = 


n 


a'(a +b) ap Oo% +( Je take + eee + b'(a +b) = 


k 


n 


i] 
fo 


lean a +( ees 5 


ee 


k=l k-1 


n n 
= 3 i n n+1 
+( Jerr yt | je eu tere tab +b F 
k k 


ae Datels kok: 
Combining similar terms, we see that the coefficient of a b 


fc") +(t)> 


no] a! ie a! I eae . 
“-Dia-k+D! 'Ha-k! &-Di@-bw!la-k+1— k 
a! ot+1 7 (n+ 1)! ee 

“&k-Dia-k! ° km -k+1)— kati - b)! k ’ 


54 


i.e. , the binomial coefficient of the form (2) with upper index increased by one. We have 
thereby proved (1) forall ne N. 
If we write 
(a+b) = (a+b) (a+b) «++ (a+b) , 
give each factor on the right an index from 1 to n, and look at all ways of choosing k 
n-k_k 


b's toget a ob , i.e., atall subsets of k indices 1<i oboe 


il we 


conclude that (7) is equal to the number of subsets of k elements ina set of n 


elements. For this reason the alternate (and somewhat old-fashioned) notation and termi- 


k n ‘ : : : : 
nology c = (;). the combination of n thingstaken k at atime, is sometimes used, 


If we think of the binomial coefficients as the number of subsets of given cardinality, 


we see that the cardinality of * P({s, OOO 5 s,}) (see Exercise 4 of §5) is equal to 


Boel ay 


Thus, Card Ps), Sanaa s}) ee 


(5) + (i) +o + ( 7) +(2). But, setting a=b=1 in (1), we obtain 


Sometimes a theorem can be proved or an object constructed using a more complicated 
form of induction. For example, we can use the following principle of "double induction". 
Suppose that for any pair of natural numbers m and n_ we are interested in an assertion 
A(m,n). Suppose that: (i) A(m,1) and A(1,n) aretrue forall m and n; (ii) if 
A(k- 1,4) and A(k,4-1) aretrue, then A(k,#) is also true (equivalently: (ii') if 
A(k’, 4’) istrue forall k' < k and 4’ <4 forwhich k' +f’ <k+4, then A(k, 4) 


is also true). Then the assertion A(m,n) is true for all natural numbers m and n. 
88. Integer arithmetic 


The purpose of this section is to give a brief description of the simplest divisibility 
properties of the integers, which we shall have occasion to refer to in various connections 


later in the book. Further facts will be given in Chapter 5, where the theory of divisibility 


55 


is carried over to more general algebraic systems. 


1. The fundamental theorem of arithmetic. Aninteger sis called a divisor 
(or factor) of aninteger n if n=st forsome te Z. Inthat case n is calleda 
multiple of s. Thenotation s|n meansthat s isa divisorof n, and stn means 
that itis not. Divisibility is a transitive relation on @. In addition, if both m In and 
n |m , then we musthave n=+m, inwhichcase the integers n and m are called 
associated. Aninteger p whose only divisors are +p and +1 (the improper divisors) 
is called prime. One usually agrees to consider prime numbers to be positive and >1. 
The basic role played by prime numbers is brought out by the so-called 

Fundamental Theorem of Arithmetic. Every positive integer n#1 can be written 
as a product of prime numbers: n= Py Pott Py: This prime decomposition is unique 


except for the order of the factors. 

The proof of the Fundamental Theorem will be postponed until Chapter 5. At first 
glance, it may seem so obvious that we should not have to prove it. But the proof is not so 
trivial. Although the theorem itself only refers to multiplicative properties of integers 
(divisibility), it turns out to be necessary to use both multiplication and additionin Z@ in 
the proof. To illustrate the non-triviality of the theorem, let us consider the subset 
gs = {4k+1|k=0,1,2,---}¢ N. S$ isclosed with respect to multiplication: 
(4 k, +1)4 k, +1) =4 k, +1. Using inductionon neS, it is not hard to prove that any 
neS canbe written n= q° ace where q, are elements of S which cannot be 
further factored into elements of S (this is analogous to the first part of the Fundamental 
Theorem), Examples of indecomposable elements of S are 5, 9, 13, 17, 21, 49. But 
the second part of the Fundamental Theorem is false for S, since, for example, the integer 
441 €S has two different decompositions as a product of indecomposable elements of S: 

441 =9 +49 = 21+ 21 é 
ln the Fundamental Theorem, if we combine identical prime factors and modify our 
€, &5 € 


1 k é 
notation, we can write n inthe form n= Ba ea Bi eee 7 isis 


56 


Every rational number a=n/me Q has a similar decomposition, except that the 
exponents €, can be either positive or negative. Note the following important fact (a 
theorem of Euclid): 

The set P= {2,3,5,7,11,13,...} of all prime numbers is infinite. 

To prove this, suppose that there were only finitely many primes, say 


Then, by the Fundamental Theorem, the number c = DyPs? P, ae Il 


Py? Po» aeacar Py . 
would be divisible by at least one of the Pi: Without loss of generality we may assume 


** p) = 1, which is impossible, since the only 


that e= pc» Then pj(c me . 


divisors ofone in Z are +1. 


2. g.c.d, and Le.m in Z. If we agree to allow zero exponents for primes 
in a factorization (of course, taking P, = 1), any two integers n and m_ canbe written 


as a product of the same primes: 


(oy {a 
Peete yet ee oo * eee ies, ae 
We introduce the two integers 


Oa ots oe 6, 6 6 
eee) k ange 
ge-d-(1)my Sp oP Lema, at) = Dy By as ge , (1) 


where = min(a, : B.) , 6 = max(a, , B.) » 1=1,2,--+k. Since 
a a 
| => d= ft 1 Sie k ’ : : Bien S 
djn = ==P) cP O< a Sa; the following assertions follow from the definitions 
(1): 


(i) g.c. d.(n, m){a, g. c.d.(n,m)|m, and if d|n and d|m ethen! dig. c.d. (n,m), 
(ii) n{l. ( Lats (ols fal), m1 c.m.(n,m), and if niu and mu , then 
Th c.m.(n,m)|{u 5 
It is properties (i) and (ii) which explain the terminology greatest common divisor (Cob }} 
and least common multiple (l.c.m). For n>0 » m>O we have the relation 


25 Go Ch (Casino) o TL, ee imal, (n,m) = om , (2) 


57 


Two integers n,m are called relatively prime if g.c.d.(n,m) = 1. inthis case (2) 


takes the form: hLc.m.(n,m) = nm. 


3. The division algorithm in Z. Given a,be Z, b>0O, there always exist 


q,reée Z_ such that 
AS eq 5 Ogiecis 


(If we only require b#0, then we have the same thing with 0<r< Ib | =) 


Proof. The set S = {a-bs ls €Z,a-bs> 0} is clearly non-empty (for example, 
a- ere a 0). Hence, S contains a least element; let us denote this element 
r=a-bq. Bythedefinitionof S, r>0. Ifwehad r>b, we would obtain an element 
r-b=a-b(q+1)eS whichis lessthan r. This contradicts the definition of r, sowe 


must have r<b. oO 


The simple proof also givesaprescription (algorithm) for finding the quotient q and 
the remainder r ina finite number of steps. This division algorithm can be used to give 
another definition of g.c.d. (andhence of lc.m. , because of (2)). 

Namely, given integers n and m, not both zero, set 

J=f{m+mvilu,vez} . (3) 
Choose the least positive element in J: d = ng + MVo + Using the division algorithm, 
write n=dq+r, 0< r<d. Because of our choice of d, the fact that 


ie = in S Golo) = eS (nu + mv)q = (iG © 99) + m(- vq) ej 


implies r=0. Hence, d[n . We similarly prove that d|m . Nowlet d'’ be any 
divisor of the integers n and m. Then 

d'[n 5 a |m => d’ |nu, 9 al ee d' |(nug + mV 5) => d'ld 
Thus, d_ has all of the properties of the greatest common divisor, and so d=g.c.d.(n,m). 
(Note that there can be only one positive integer, the g.c. d. , with property (i) above, 


since if there were two, g and g’, wewouldhave g eg and g' le , and so 


g'= +g.) We have proved the following assertion. 


58 


Given two integers n and m, not both zero, their greatest common divisor can 
always be written in the form 
gc.d.(njm) = nu+mv; u,ve Z a (4) 
In particular, two integers n and m are relatively prime if and only if 
nu+mv = 1 (4') 
for some u,ve Z. 
The last part of this assertion follows because we have already verified that we can 
write 1 = g.c.d.(n,m) = nu+mv; and, conversely, if (4') holds, then dln j d|m => 


d[nu , d|mv => d|(nu + mv) => d{i => d=t1, 


D 


The proof of (4) and (4’) was effective. One takes an arbitrary positive element of 
J (see (3)), and then finds smaller and smaller elements of J using the division 


algorithm, until one obtains the least element, which will be the g.c.d. 
EXERCISES 


1, Every prime number >2 hastheform 4k+1 or 4k-1, Using the 
multiplicativity of the set S in subsection 1, prove that there are infinitely many primes 


of the form 4k-1. 


2, It can be proved thatif n,me Z, g.c.d. (m,n) =1, and p_ isan odd prime 
2 2 
dividing n +m , then p isofthe form 4k+1. (See subsection 1 of §2, Ch. 9.) 


Use this fact to prove that there are infinitely many prime number of the form 4k+1. 


3. If anatural number nis divisible by exactly r different prime numbers 


Py" BLS then the number of positive integers less than n and relatively prime to 


2 Bee | ee 
oa (8 


The function »:.IN-~ N_ is called Euler's function, Show that this formula holds for 


n is equal to 


n<25, andalsofor n ofthe form n= oe (see also subsection 4 of §1, Ch. 9). 


n 


59 


Using the binomial formula and induction on 


n is divisible by p forany ne Z. 


n 


’ 


prove that, if p 


is a prime, 


Chapter 2. Sources of Algebra 


The rectangular matrices introduced in §3 of Chapter 1 occur so often that an 
independent branch of mathematics called matrix theory has evolved. Although it arose in 
the middle of the last century, it acquired a complete and elegant form somewhat later, when 
linear algebra developed. To this day matrix theory remains a tool well-suited both to 
applied problems and to the abstract constructions of modern theoretical mathematics. Here 
we shall present the simplest results of matrix theory. 

The title of the chapter may give rise to the illusion that we intend to rely upon 
geometry to describe our purely algebraic objects. But actually, it is only that we find it 
convenient and efficient to express the properties of matrices and solutions of linear systems 
in a language borrowed from geometry. The concepts of a space, a vector, linear 
dependence, the rank of a system, etc., are developed precisely to the extent that they are 
needed for our immediate purposes, Our approach is basically algebraic, and we do not give 
geometrical intuition the key role that it plays in some other treatments of the subject. 

Linear spaces will be necessary to us in order to speak of linear mappings, the 
companion concept to matrices, It is composition of mappings (subsection J WSs, (dq MW) 


that gives the most natural explanation of matrix multiplication. 


61 
$81. Vector spaces 


1. Motivation. When we studied systems of linear equations, we had to consider 


rows of length n invarious contexts. They were the rows (a tft Ys 
i i2 i 


i <= <= fon 


, ofan mxXn matrix A= (a, ? , and also the solutions xP OoG sey Gi 
n 


x) 2. 
the linear system with matrix A. The elementary transformations of type (II) that were 


used in §3 of Chapter 1 to reduce a matrix to step form involved two basic operations: 


multiplying rows by a number, and adding two rows. The same operations can be performed 


upon the solutions of a homogeneous linear system. ‘That is, if (x), x) yore, x) and 
n 
(xy ; x ; 6 x" are two solutions of the system 
aoe, ae Boe ae oon Gr Ghee = UG eS Aon 5 ies 


Tee Wa inn 
andif a@ and §B are any two rea] numbers, then the row 


(ax, + Bx); ax, + Bx), see, ax, + Bx.) 


will also be a solution of our system: 
a, (ax) + Bx}) + a, (ax, ap Bx;) ap Gao Sp a, (ax, + Bx") = 


= a(a. it ab El, io pa dociea. wt + BCA] + a.oX) +- ‘sta x) = 


On the other hand, any row of length n, no matter what it staads for, is an element 
of the “universal” set IR -- the n-th cartesian power of the set R_ of real numbers. 
So it would be worthwhile to study this general object; its properties can then be carried 


over to matrices and to the solutions of homogeneous systems. 


2, Basic definitions. Let n be a fixed natural number. The n-dimensional 


n 
vector space over IR isthe set IR (whose elements are called row-vectors, or 


simply vectors), considered along with the operations of adding vectors and multiplying 


vectors by real numbers (real numbers will be called scalars). We shall denote scalars by 


small Latin or Greek letters, and we shall denote vectors by capital Latin leters, like 


matrices, In fact, a vector X= (x, yXoo°" x) can be though of asa 1Xn matrix. 


62 


Let Y= (vy, ges va) be another vector, and let ) bea scalar. By definition 


A ap Ve 


(X) tVy2Xy+Vy000* ’ xy) ’ 
AX = (XX) 9 AXy ott ’ x) Q 


We shall let the usual symbol 0 denote both the real number 0 and also the zero 
vector (0,0,---,0). In addition, it is customary to identify R! with R. 

The formal properties of operations with real numbers, which are well known to the 
reader, carry over to R" . Although it is boring to list them, doing so gives a precise 
idea of how one can define an abstract vector space; such abstract vector spaces (not 
necessarily having real number coordinates or finitely many dimensions) are important in 
many fields. Here is the list of properties satisfied by the operations of addition and scalar 
multiplication on a vector space: 

VS, >X+Y=Y+X for any vectors X,Ye TR (commutative law); 

VS, :(X+Y)+Z=X+(¥+2Z) for any three vectors X, Y, Ze R° 

(associative law); 
VS, : there exists a special vector O suchthat X+0=xX forall Xe R° : 
VS, every XeE R" has a negative (additive inverse) vector -X such that 
X+(-X) = 0; 

V8,:1+X=X forall XeR"; 

VS: (@B)X =a(BX) for all a, Be R, X eR’; 

Vs. > (@+ B)X=Q@X+ BX (distributive law for scalars); 


VS :a(X+Y)=a@X+ayY (distributive law for vectors). 


The uniqueness of the vectors O and -X in VS, and VS, , and the other 
simple consequences of these rules (which are called axioms if we have in mind an abstract 


vector space), will not be derived here, since the derivations are very easy and can be left 


to the reader. 


n ' : 
We referred to IR asan n-dimensional space , but the notion of dimension will 


63 


only acquire a precise meaning at the end of the section, after a little preliminary material. 
The origin of the term "vector space" is clear after studying analytic geometry, where one 
learns of the one-to-one correspondence between points (vectors) in the cartesian plane and 
ordered pairs (x,y). Adding vectors by the parallelogram rule and multiplying them by 
real numbers precisely correspond to the operations on row-vectors of R? , as defined 
above. 

In addition to the vector space of row-vectors (x, Kos "ty x) of length n, one 


can consider the vector space of column-vectors of height n 


as we agreed to denote them in §3 of Chapter 1. There is clearly no essential difference 
between these two vector spaces, but we shall soon see that it is useful to have both versions 
of a vector space. It is usually clear from the context whether one is talking about row- or 
column-vectors, so that we shall not introduce any further notation to distinguish the two 


types of vectors, 


Let V bea non-empty subset of R™ . We shallcall V_ a linear subspace of 


Mn MOG WW Se CC ap fel a WY (1) 
forall a,Be R. (This definition at first glance might not seem satisfactory, since it does 


not explain in what sense V_ isa "space", but we shall say some words in its defense at 


the end of the section.) Note that (1) implies that the zero vector always belongs to Ws 


For example, the set of all row-vectors (x, ae 0) with ae 0 isa 


; : (aia 
linear subspace of R° ; it is customary to identify this subspace with IR . We have 
the "chain" of imbedded subspaces 


(omer ce R GIR a 


64 


The solutions of the homogeneous equation x sP Xo ap Ooo sp = 0 make up a subspace in 
R° » 1>1, which is different from the zero subspace and from all the IR’ in the above 


chain. Other examples will be given below. 


3. Linear combinations. Linear span. Let xX) 5 xX, 5 P2 g xX be vectors in IR" ; 
and let HAs eer ys OH be scalars. The vector X = a,x, +a, X, staan +a Xx is 
called a linear combination of the vectors x, with coefficients a. For example, 

(2,3, 5,5) -3(1, 1, 1, 1) + 2(1,0, -1,-1) = (1,0,0,0). Nowlet Y= BX oP B, X, Pp O00-4p BX, 
be a linear combination of the same vectors Xx with coefficients B. » andlet a@,BeR. 
Then 
ax y= xX eee ene = 
+B a(a,X) +O,X, + als aX) + B(B)X, + BX, + + BX) 
= (@a, +BB,)X, + (aa, + BB,)X, + +++ + (aa, + BBX 


is also a linear combination of the vectors X,; its coefficients are aa, + BB. We thus 
i i i 


see that the set of all linear combinations of a given set of vectors xX) , X, p20 4 x isa 


: n : 
linear subspace of IR. We denote this subspace by the symbol (X, ; X, ptt x) 5» Ellie! 
we call it the linear span of the set of vectors Xx) ; xX, 5 O88 4 xy . It is also customary to 


say that the space (X) , Xx, pitty xX) is spanned or is generated by the vectors 


Xr Xoretts x 


kK? 
It is possible to define the linear span of any subset SC R’: (S) is the set of all 
linear combinations of finite sets of vectors in S. Clearly, if V isa subspace of R° ; 
then (V) = V, since any linear combination of vectors in V belongsto V. More 
generally, since SC V => (S)C V, it follows that the linear span (S) can be defined 


as the intersection of all subspaces containing the given set S_ of vectors of R': 


s)= av. (2) 
se V 


It might not be obvious at first glance, but the intersection of any set of subspaces, as in (2) 5 
will always be a subspace. Namely, if X,YeNV, then X,YeV forevery V_ inthe 


set of subspaces. Hence, @X+ BYeV forall a, Be IR; since thisistrue forall V ; 


65 


wehave @X + BYeNV, as required. 

Unlike intersections, the union of two subspaces U and V_ is not, in general, a 
subspace. For example, let U = {(i,0)|) ¢ R}, v={(0,,)],e R} in R? . The 
linear span (UUV) is called the sum of the subspaces U and V: 

U+V = UUV) = {ut+vlucuU, vev} 


lf UNV =0, we say that the sum U+V_ is direct, and we write U@®V. Let 


and let X= xX, +X, = x +X} be two expressions fora vector XeéeV as 


iv pa 2) 2 


a linear combination of vectors XX, € US and XX) € Vy . Then we have 


-X' = Se! + i = = t _vt 
xX l X, X,€ VV, but since Ve 0, we have KX, =X), X,=X,- 


Conversely, ifevery XeV= vy +V, can be written as x) ee rede Weg eS le Bo 


2 Det AL 


in a unique way, then the sum V= a + V5 is direct (we leave this as anexercise). More 


generally, if V is a sum of subspaces Vi OU ie CR" » wecall V_ the direct sum 


and write V= vy S---S V. if every vector in V_ can be written uniquely as a sum of 


k 


vectors in the V; é 


Example 1. Consider the following two sets in R" (m <n): 


Oe a O,--+, Oh, e R} 


m?’ 
and 
syagh eR} - 


a Ore eae 


n n 
One immediately checks that U and V are subspaces of R , that U_+V_=R 
m m m m 


and that U_NV_=0. Hence R' =U @V.. 
Example 2. In R° we consider the so-called unit row-vectors 
Ey (0 0); By =O ees M)s ges a) . (3) 
Every vector X= (x, rXya tty x) can be uniquely written in the form 
x= x, EF, oF x,E, apes ‘+x ED o Jnienee,, 
RoE) @ (2) e+->o <2 ) 


In an analogous way, we shall denote the unit column -vectors by the sumbols 


3 


66 


2) 10, is +0 ee = 00 eee ; (3') 


(i) 


me) Sissel, ES 


We shall use the notation E. and E later, 


5 no, rl 
4, Linear dependence. A set of vectors x ytety xy in R_ is called linearly 


dependent if there exist k numbers Hr Ayrre, |, not all zero, such that 


ed =0 4 
aX) + aX, + ae (4) 


(the right side is the zero vector). We say that (4) is a non-trivial linear dependence 
relation. On the other hand, if aX, ap aX. ap O00er % Xx) =0=> =O). =o =  . 
then the vectors xy 9 X, pote, Xp are called linearly independent. 

Example 2 above shows that the unit vectors Ey 3 E, 5 OOO 5 EY are linearly 
independent. A single non-zero vector X_ is always linearly independent, since X= 0 , 
X#0=>%4=0. Also note that the property of linear independence of a set of vectors 


xX) gee 5 Xp does not depend on the order in which the vectors are taken, since the terms 


aX; in (4) can be permuted in any way desired. 


THEOREM 1. (i) If any subset of {X, O00 « XI} is linearly dependent, then the 
entire set of vectors is linearly dependent. 

(ii) Any subset of a linearly independent set of vectors 1 yttt, Xt is linearly 
independent, 

(iii) If the vectors xX, 7 PEP p xX) are linearly dependent, then at least one is a 


linear combination of the others, 

(iv) If one of the vectors xX, 5 O00 ¢ xy is _a linear combination of the others, then 
the vectors x, rot %s xX are linearly dependent. 

(v) Ifthe vectors xy 880 4 Xy are linearly independent, and the vectors 


x) POO 5 Xu X are linearly dependent, then X_ isa linear combination of the vectors 


(vi) If the vectors xX) ytrey xX are linearly independent, and the vector Xia 1 


cannot be expressed as a linear combination of x g 200 6 xX , then the set 
ee ee NG SUE ONTO NOI OE Seat l 


67 


Xi, cee, Xp ey is linearly independent. 


Proof. (i) Suppose, for example, that the first s vectors x pero nS 5 SS, 
s 
are linearly dependent, i.e., 


a.xX eee = 
ea at + aX, 0 
where the q@. , ayers = i 4 
, are not all zero. lf we then set Oa = 4 = 0, we obtain a non 
trivial linear dependence relation 
Q@X tere + aX a cee = 
sae ss * ele os y et ve 


(ii) follows immediately from (i) (proof by contradiction). 


(iii) Suppose, for example, that Oy # O in (4). Then 


(iv) Suppose, for example, that xX = B,X, foiece + Boy Xp] . If we set 
a, = Bis 2 es Boy » OH, =-1, we arrive at the relation (4) with my #0. 
(v) lf we have a non-trivial relation 
BX, teen + BX, + BX = 0 
in which f # 0, then we obtain what we want as in (iii). But we cannothave fB=0, 
since xX, ,°+*, X, were assumed to be linearly independent. 


k 


(vi) follows immediately from (v). s 


5S. Bases. Dimension. We now given an important 


Definition. Let V_ bea subspace of IR’. A set of vectors Xiattts xeV 
is called a basis for V if it is linearly independent and its linear span coincides with V: 
On a : 
From this definition and the definition of the linear span of a set of vectors it follows 
that every vector XeV_ canbe expressed in a unique way in the form 


X= aX) ++--+a@X_. The coefficients eae ae a € IR_ are called the coordinates of 
cTaaTs ere 


68 


X inthe basis Xs cee, Xo. 

We have seen that the linearly independent unit vectors (3) span ie Hence, 
ie sEjyees EW} isa basis of IR’. This basis, which is far from the only basis of 
R" , is called the standard basis. The reader can verify that, for example, the vectors 


By Slog a Beri Be Se alee lea Ot 


ay Be ea Ee 
also make up a basis of R" 5 
We have not yet answered the important questions: does every subspace of IR’ have 


a basis, and, if so, is the number of basis vectors the same in each basis? Both questions 


have a positive answer. To see this we shall need the following lemma. 
LEMMA. Let V bea subspace of IR’ with basis Xe ee ancien 


we Y, goa lens linearly independent set of vectors in V. Then s<r, 


Proof. Y, gots af , like all vectorsin V, are linear combinations of the basis 


vectors, Let 


teow ba X ; 
iell se 


as 
" 


Ys aX) + a,x, I ae 5 


where AG are scalars (since they are the coordinates of the Y,, they are uniquely 

J 
determined, but at this point we are not concerned about that). We use proof by contradiction. 
Suppose s>r, 


We form a linear combination of the vectors s with coefficients x, : 
J 
x Ypte apes a = 


a) |X) +4) oX, eee +a, 4x,) xX) shee 


ees ese 
(ay + a 9X5 + + aX) 9 5 


and we consider the following system of r linear equations with s unknowns: 


69 


ES) Ae ae 26 


ay yoy ee ees aa 


SES Oe Nes oe eno) ec, soe) uss 6) ee: <0. 


il 
oO 


a xX. + ec 
Clee) 7 eee 


Since we have supposed that s>r, Corollary 2 of §3, Ch. 1 applies. It says that our 
system has a non-zero solution (x) ; x5 see, x$) . But this gives us a non-trivial linear 
dependence relation 


S 9 eeo 9 — 
xPY, +x5Y, + coe On 


which contradicts the hypothesis of the lemma. Hence, s<r. [s) 


THEOREM 2, Every non-zero subspace VC R" has a finite basis. All bases of 
V have the same number r<n ofvectors. (This number r is called the dimension 


of V_ and is denoted dim, V or simply dim V.) 


Proof. By assumption, V#0. Let xX) be any non-zero vector in V. Suppose 


that we have found k linearly independent vectorsin V: xX) OOO. DK If the linear span 


-? 


(X 


see, X,) does not coincide with V, then we choose any vector Xa in V 


1? 


which is not in {X, Breese xX) . In other words, Xa is not a linear combiriation of 


By Theorem 1 (vi), the set X_,-++,X,,X is linearly 


the vectors X,,++:, X ? »X Xp y 


hee Tee 


independent. This process cannot go on indefinitely, since all of the vectors Xx. lie in 
ia E_), and, by the above lemma, no linearly independent set of vectors in 
n 


n # 4 q 
IR. cancontain more than n vectors, Hence, for some r<n_ the linearly independent 


«>, X,,+°+, X_€V_ becomes maximal, i.e., no matter what vector Xe V 
ie = 


1 k? 
we add to the set, {X, food - Xx. ,X} is linearly dependent. By Theorem | (v), this 
means that Xe (X), cee, 2) for every XeV. Hence, V= {X, see, x.) and the 
vectors x) yttty x. are abasisfor V. 


Now suppose that Y, ,+e2, Y_ is another vasis of V. By the lemma, we have: 
s 


s<r. But if we interchange the roles of xX, yore y X.. and Yy icine: ue in the lemma, 


we similarly obtain: r<s. Hence, s=r, and the theorem is proved. im 


Note that all of the above discussion applies equally well to spaces of row-vectors or 


70 


column-vectors. 

Theorem 2 allows us to associate to every linear subspace V_ of R° a positive 
integer r<n, which we call the dimension of V: r= dim V. In particular, 
dim R" =n. This important number can be characterized in other ways (see the exercises). 
One possible definition of dimension is based on the notion of the rank of a set of vectors. 
Namely, if {X, oa \ is a set of vectors (possibly infinite) in R° , then we now 


know that the dimension of the linear span (X) ,»X,,°**) isSmo greater than n. We call 


2? 


this dimension the rank of {X, jah o9e }; 


rank {Xa X55 ost 2 dim (X),X,4° 


Finally, some words to justify the term "linear subspace". In a linear subspace 


vo IR" choose an arbitrary basis X,,-+-, x. . Then X= aX, ap OCO Gp aX. for 


1 ? 
every XeéeV, andtheset V_ is in one-to-one correspondence with the set of all rows 
(a, °8%y a) of length r. Under this correspondence, a linear combination of vectors 
corresponds to the same linear combination of the rows of coordinates. Hence, once we 


: : ref : 
choose a basis of V, wecan interpret V asthe vector space IR’ imbedded ina 


certain way in R" (n> rr). 
EXERCISES 


I, Mbyeie Wh, vi and vy be subspaces of R" » where VC vi +V, coeltais 
always true that V=VN vy +VN Vo ? What can be said about this in the special case 


when VS Wee 


2. Let V_ bea subspace of RR’. ff V=uUeW , thenthe subspace W is 
called a complement of U in V, and U iscalleda complement of W in V. 
Does U_ have only one complement in V? Compare W_ with the set-theoretic notion of 


complement V\U (see 84, Ch. 1). 


3. Show that the vectors x = (1,453) 4 xX, = (3,2,1) are linearly independent; 


71 


consider the linear span V = (X, F X,) + show that the vector X = (-5,2,9) belongs to 


V, and find its coordinates in the basis xX, , X, ; find a complement (any complement) of 


4, Show thata setof n vectorsin R° spans IR" if and only if it is linearly 


independent. 


5. Show that every linearly independent set of vectors X._,°+- no 


? in a subspace 


k 


WES R° can be included in (extended to) a basis for V. 


6. Let U and V_ be subspaces of IR". Prove that if U QV=0, then 


dim(U + V) = dim U+ dim V. 


7. Find the rank of the set of vectors (0,1,1), (1,0,1), (1,1,0). 


© 


82. The rank of a matrix 


1. Back to equations. In the vector space ee of columns of height m, 


consider n vectors 


(i) _ coo. a ] ji 2 1b, Apes 5 ol ; 


A = La boon ? mj ? 


and their linear span V = ca) ; hey see, es . Suppose we are given another vector 


B= [b, ; b, 7 O80 4 bo ]. We would like to know whether B_ belongs to the subspace 


7 esiR™ , and, if so, how its coordinates by AOD ¢ bo (in the standard basis, see (3') 
Ao 


81) can be expressed in terms of the coordinates of the vectors . Inthe case 


dim V=n, i.e., when the a) form a basis, the second part of the question asks for the 


(n) 


4 I 
coordinates of B_ in the basis ak a Gc0y, ag 
To answer this question, we take a linear combination of the vectors ai) with 


1 n a 
arbitrary coordinates x., and write the equation x, A! ) cp OOo ap x A‘ De B. Writing 
J 


this in the form 


72 


b 
iil Te "Tn 1 
b 
om ao Aon 2 
= aig l 
x]: Py . + Sees || . (1) 
b 
em am? ann m 


we see that we have a system of m_ linear equations with n unknowns: 


egy Spee eer Eh Se 
Oo oe le ake ee (2) 
anal l + a oo +- + a an*n = bo P 


This is the type of system we first discussed in §3 of Chapter 1. There we introduced the 


matrix and the extended matrix of the linear system (2): 


Siler ioe OP an oie 718 all 
all a coe a a or a b 
22. 2n Pe}: 2n} 2 

A= , (A[B) = (3) 
aml A mn2 se ann aml am2 a ann Din 


It may seem at first that we are back where we started, having wasted time and accomplished 
nothing. But, in fact, we now have several important concepts at our disposal. We need only 
become accustomed to using them. 


At this point it is useful to agree on some notation. We shall often abbreviate a sum 


n 
s oF 8, ap OG 4p SS by writing 2 Si° Here Scr, 8 can be anything (numbers, 


row vectors, etc.) which satisfies the usual rules of addition of numbers or vectors. We 
have: 
D o 


nh n n 
pe! », Si > (oy = Dy s, + DS t. C 


We shall also consider double sums 


in which the order of summation can be chosen in whichever way is convenient, It is easy to 


73 


see that the sum does not depend on the order of summation, if we arrange the as in the 
form of an mxXn_ rectangular matrix; it makes no difference whether we sum the entries 
along the rows or along the columns, 

Other types of summation will be introduced when we need them. 


2, The rank of a matrix. By the column space of a rectangular myn matrix A 


1 2 
( a af ) see, A) introduced above. We shall 


(see (3)) we mean the space V = (A 

denote this space by the symbol ae or simply Me (v for vertical). We call its 

dimension r (A) = dim Ve the column rank of the matrix A. We similarly define the 

row rankof A: r (A) = ohiveal ve » Where Vi, = (Ay; Ay poe 4 ae is the subspace of 
n 

IR spanned by the row-vectors A. = (a, Bag 20H fhe, eed, Acco, imal Gn ie 


for horizontal). In other words, 


r (A) = wenden? cee 


AV rank{A,, a ass 
are the ranks of the set of column-vectors and the set of row-vectors, respectively. By 
Theorem 2 of §1 the number r(A) and 1 (A) are well defined. 
Following the definition in §3 of Chapter 1, we shall say that a matrix A’ is 


obtained from A _ using an elementary transformation of type (I) if AS = A ; AL = Ag 


for some pair of indices s#t and A = A. for i#s,t. We shall say that A’ is 
obtained from <A using an elementary transformation of type (II) if A. = A. for all 
i#s and Ag Ag thy? STAita ne lve. 

Note that both types of elementary transformations are invertible, i.e. , the matrix 
A' obtainedfrom A using an elementary transformation can be changed back into A 


using another elementary transformation (in fact, of the same type). 


LEMMA. If A' is obtainedfrom A_ using a finite sequence of elementary 


transformations, then: 


G) 1A) = 04); 


74 


(ii) rts’) = r(A) 9 

Proof, it is sufficient to consider the case when A' isobtainedfrom A _ using a 
single elementary transformation. 

(i) An elementary transformation of type (1) clearly does not change r (A) , 
since the linear span of a set of vectors does not depend on the order in which they are listed: 


TE Oo Seay oy eee, A) = CA Age egy eae) Next, 


NE ae ee SEE sob enieten UY antoe esis jeans 


‘ aan 


= (Ais oe Ags oes A yore A? . Thus, 1A) does not change when an elementary 


transformation of type (II) is applied. 


(ii) Let a) » J=1,°°°, 0, bethe columns of A’. It is enough to prove that 
n n 
(j) j) 
,.A = 0 <=> ye NA = 0 
j= 3 jal! 


In that case every independent set of columns of one matrix corresponds to an independent 
set of columns (the ones with the same indices) of the other matrix; in particular, maximal 
independent sets of columns (i.e. , bases) correspond, and so r(A’) = r (A) . Further 


note that, since elementary transformations are invertible, it suffices to prove the implication 


ol é 
in one direction. Suppose, for example, that a ne AW =0. Then, if we replace z 
tat 


by hj and all of the b, by O in (1), we see that Oy ; % GUO - hw is a solution of 
the homogeneous systems HS associated with the system (2). By Theorem 1 of Chapter 
1, this solution is also a solution of the homogeneous system HS' whichhas A' as its 
matrix and is obtained from HS _ using an elementary transformation of type (1) or (11). 
Since the system HS' canbe written intheform = a = 0, we arrive at the relation: 
D0. 


2 


j O 


The basic result of this section is the following fact: 


THEOREM 1. r (A) = 1A) for any rectangular mxn matrix A. (This 


number is called simply the rank of A, andis denoted rank A.) 


Us 


Proof. By Theorem 2 of §3, Ch. 1, the matrix A can be reduced to the 


following step form by applying a finite number of elementary transformations to the rows 


Or AY: 
Bypere Arps Eppes Bye ain 
tee Bop eee Agoeee Ao. ay, 
ORe etre 0) Pay.aa bie a meee 
x eo oe seen nen n 
OR rary Oe eet Ow oe ee (4) 
Onda © Asay (aon Fane 
rs rn 
O soa © O nae @ 4 0 
Ot nese Olen oC) s (On 
where 811 2K 836° ae #0. According to the lemma, 


(A) =)7(A) , 2A) = nC) , 
so that it is sufficient to prove that r (A) = r,(A) 


The columns of A and A_ whose indices 1,k,4£,°*+, 8 correspond to the 
principal variables in (2) are called basis columns, We now show that this terminology is 


justified. Suppose we have a linear dependence relation 


pal) = (k) Z(%) eS) 2 
A +A tgs + ASA = 0 , 
oa = —(k Ss = Fats 
for the column-vectors aid) = (ae pr dis a‘ Pre ureo ies: a‘ = 
= [aj 69459 soe 189, -++,0] of the matrix (4). We then successively obtain: 


\ Oo Oe, ea, Ca aoe o oa — 0, and, sinee allt tie 


one a a - i = = — oo = =Q 5 
a ; agys aon an are non-zero, it follows that y Ae hy Xd 


Ss 


— —(k —_ —(s — 
Hence, nea) a ds a) see, a! y =r, and so ri(A) > r. But the space Ube 
spanned by the columns of ‘A can be identified with the space spanned by the celumns of the 
matrix obtained by deleting the last m-r zero rows from A. Hence, 


Ts (A) =dim V_ < dim R™ = r., Comparing the two inequalities shows that x (A) Sie py 
v y= 


(The inequality r (A) <r also follows from the observation that all of the columns of A 
v = 


76 


are linear combinations of the basis columns; we leave the details of this argument to the 
reader as an exercise. ) 

We now consider x, (A) . We prove that all of the non-zero rows of A are 
linearly independent in the same way as we proved that the basis columns are linearly 
independent. Namely, if we had a relation 


ro) ea ane ; 4, € R ' 


O,++-,}.a =O, and then 


then we would obtain successively eu = 0) 5 oe 


» hy Boy = 
ApH Ay aere =A,=0. Thus, (A) =rs x (A). Oo 

3. Solvability criterion. The step form of a matrix A , which allows us to answer 
several questions about the linear system (see §3 of Chapter 1), involves an element of 


arbitrary choice, because of the flexibility in our choice of basis columns. In spite of this, 


our proof of Theorem 1 leads us to the following 


COROLLARY. ‘The number of principal variables in a linear system (2) does not 


depend on the manner in which the system is reduced to step form. This number equals 
rank A, where A_ is the matrix of the system. 


Proof. We have seen that the number of principal variables is equal to the number of 
non-zero rows in A (see (4)), which coincides with the rank of A. The rank of a matrix 


was defined in an invariant way, i.e., it depends only on the matrix, not on A. Oo 


In the next chapter we shall find an effective method for computing the rank of a 
matrix A , which does not require reducing A to step form. Of course, such a method 
increases the value of any assertions based on the concept of rank. A simple but useful 


example of such an assertion is the following criterion for solvability of a linear system. 


THEOREM 2 (Kronecker-Capelli). A system of linear equations is compatible if and 
only if the rank of its matrix is equal to the rank of the extended matrix (see (3)). 


Proof. As explained at the beginning of this section, a system of linear equations (2) 


can be written in the form (1), and so it is compatible if and only if the column-vector B 


Ut 


of free terms can be written as a linear combination of the column-vectors ad) of the 
matrix A. If it can be so written, then Be on) OO 5 a) » and hence 
kia? pitty aly = rank{ aS) | tee, oe B}, and rank A= rfA) = 1 (A [B) = 
= rank (A [B) (see Theorem 1). 


le) (ji) 
Conversely, if the ranks of A and (A |B) coincide, and {A aes ,A } is 
Ga) a) 


a maximal linearly independent set of columns of A, thenthe set{A  ,++- 5 & : ,B} is 


linearly dependent. By Theorem 1(v) of 81, this means that B isa linear combination 


(j,) 


of the columns A : . Thus, the system (2) is compatible. sy 
EXERCISES 


l. Prove Theorem 1 without reducing the mxXn matrix A= (a, to step form. 


2. As inthe case of rows, interchanging the s-th and t-th columns of a matrix 
A is called an elementary transformation of type (1), and adding the t-th column 
multiplied by a scalar } tothe s-th column is called an elementary transformation of 
type (II). Describe the step form of A obtained by applying elementary transformations 
to the columns, Use elementary transformations of the columns to reduce the matrix A 


(see (4)) to the form 


TD 
499 
os ae ’ 
0 | 
| E 
0 
ee os ge Ress, ie a eed 
= A) =f =] ‘ =a , a,. 
ge ees agar, 2enyese) See re i 


3. Show that, if ay # 0, then the square matrix 


18 


0 
Il @ sag O ay 
O I soo @ O ay 


has rank no. 


4, Express the condition that the two matrices 


OO, «ee a a, Oso. O 
= 5 BS B, B, Brats Be 
B, B, near B. Y% Yq s0: te 


have equal rank in terms of a geometrical property of a set of n_ lines in the plane. 
§3. Linear maps. Matrix operations 


1. Matrices and maps. Let R®" and IR” be the vector spaces of columns of 


height n and m, respectively. Let ery be an mXn matrix, We define a map 


n 


Pa? R' ~ R” as follows. For any X = [X)5 Xp aoc ; x] eR" 5 les 
a (1) (2) (n) 
© (X) = x, A +x, A age es ; (1) 
1 
where at a coey a) are the columns of A (compare with (1) in §2). Since these 


columns have height m, the right side of (1) gives a column-vector 


Y= i ; Yos eee, tee! € ng . Writing (1) in more detail, we have 


ial ‘ o ‘ f A 
@(X +X") = 2 (xt yal) > ae + ps ea = O(K)+O A(X") 


In addition, 


79 


is} n 
: (i : 
AOE = De » ae = 19,0) , .€R 


no 
Conversely, suppose that o: IR - R™ is a map of sets which has the following 


two properties: 


(i) @(X'+X") = o(X') + O(X") forall X',X"e R"; 


(ii) 9X) =49(X) forall XeR", ,eR., 
; (1) (n) 1 
Then, letting EO Rede EF and EW ROOe 5 EY denote the standard basis columns 


(see subsection 3 of 81) of the spaces R" and Re » respectively, we apply properties 


ist A 
(i) and (ii) to an arbitrary vector X = [Xs %qye0* x] S xX, ce € R" 
j=! 
Se) : (j) 
= es i 
@(X) = (2 x, EF = 22, xo(ED) (2) 


The relation (2) shows that the map % is completely determined by its values on 


the basis column-vectors. If we set 


(j) yy (i) (j) 
= = ooe = ™m 
ae al = em ip ee ‘ Zing) ae ee, 3) 

we see that giving » is equivalent to giving an myn rectangular matrix A= (a) with 


ee ne oy 


A Then the expressions in (2) andin (1) coincide. Thus, we 


columns “y 


may set 9 = © . 

Definition, A map © = can 5 R= ites having properties (i) and (ii) is calleda 
linear map from R” to IR”. Often (especially when n=m), the term linear 
transformation is also used. The matrix A_ is called the matrix of the linear map a, s 

Let can Par be two linear maps R= RR” with matrices A= (a, and 
if and only if the values © (%) and © 4 (X) coincide for all 
(j) (j) j) ; 


= ; = 
i eS 5 l<j<no, so that ed 


AM 
poo) S 


SE Then P, = 0 


X€ ie 4 Jt particular, OE )= 9 (E 


and A'=A. 


80 


We summarize our results: 


THEOREM 1. ‘There is a one-to-one correspondence between linear maps from 
eS ee Ba Oly 


m : 
R to R and mxXn matrices. Oo 


It should be emphasized that it makes no sense to speak of a linear map S-— T of 
arbitrary sets S and T. Conditions (i) and (ii) presuppose that S and T are 
subspaces of vector spaces R° and IR”. 

We call attention to the special case when m=1,. Thenalinear map ©: R = IIR 35 
which is usually called a linear function in n variables, is given by specifying no scalars 
Ayr Agyeery at 

HOST NGS ye ey) = a Ge eye oe tax , (4) 
(Note that this terminology is different from that used in high school, where, in the case of 
one variable x, a linear function is defined as a function of the form xHax+b -) 

For fixed mn and m, linear maps R= R™ can be added and multiplied by 
scalars. Namely, suppose that Py ; Op : R" ad R™ are two linear maps. We define the 
map 


n m 
P= ap, + B,: RR —>R 9 Gye wR » 


by setting (X) equal to the corresponding linear combination of the values of Ps and 
p : 
X) = @ 
@(X) © (X) + BOX) 
The expression on the right is an ordinary linear combination of column-vectors. 
Since 


O(X' +X") = ap (X' +X") + BO(X'+X") = 
A B 


aio, (X') ar ~ (X")} 1 BLO,(X') a O(X")} = 


Loup ,(X') + Borg(x')} + {eo (X") + BOX") = @(K') + @(X") ; 


P(X) = (AX) + BOX) = or P(X) + BX O(X) = 


= 11a@ (X) + BO(X)} = 19(X) 


81 


h : se je ; ; 
(here we have implicitly used the rules VS) VS in §1), it follows that © isa linear 
map. By Theorem 1, we may speak of its matrix C 20 =o - Inorderto find C, we 


write the j-th column, following (3): 


(j) 


n 


cm ox = 
[ 1j? “29? 2 mj 


e) + BOE 


=e = 
(i) 


aoe a al) + BB” = 


= ao (E 


= [wa,.+ 8b biG 
pee ty oa pay on eee 


It is natural to call the matrix C = (c) with entries c,, = awa,, + Bb,, the linear 
lJ i} Y 


combination of the matrices A and B with coefficients @ and 8. Thus, we define 


oh eee oa 

aii TWh ll ia ey ey oe Ae ae 
ils 2 6 2 6 2 db jis o a2 o 6 6 olf = le w 2 a 6 6 6 62 of]. (5S) 
aul *t* 2m ba Gao ae aa , th : oat BOL 


sunen, 


i hoes ia dee aie (6) 
We will very frequently make use of the fact that a linear combination of linear 
functions is a linear function. 
Finally, we note that, if the rules Vs, 2 VS¢ in §1 are rewritten with the column~ 


vectors X, Y, Z everywhere replaced by mxXn _ matrices, then we obtain rules 


VSM, - VSM, (where addition and scalar multiplication of matrices are defined by (5)), 


8 

which justify our speaking of the vector space of mxXn matrices. Wecanthinkof mxXn 
: Ae mn 

matrices as a compact way of writing elements of the vector space IR of rows of length 


mn (by dividing a row into segments of length n and placing these segments under one 


another). 


2. Matrix multiplication, The relations (5) and (6) show that the operations of 
addition and scalar multiplication in the set of mxn matrices and the set of linear maps 


n m : ; : 
ike) =) IR agree, i,e., these operations are preserved in the one-to-one correspondence 


of Theorem 1. In set theory we have the important concept of the product (composition) of 


82 


two maps (see subsection 2 of §5, Ch. 1). It is reasonable to expect that the composition 


of two linear maps will give us an important operation on the corresponding matrices, We 


now see how this works. 


Let p : R" ~ R° and Ps : R° ~ me be linear maps, and let Pa = ae Op 


be their product: 


® 
R® G ms R@ 
oe As 
RS 


Before writing Pa for the composition © = oe ©, » we should actually verify that this 


product is a linear map, but this is easy: 


(i) OO +X") = G(X +X") = G(X) +0(K")) = 


P p(X’) +O p(X") = O(K') +O (K") 5 
(11) POX) = ©, X)) = O,0.9,(X)) = LO.) = 


r~@(X) 5 


hence, by Theorem 1, the composition © corresponds to some matrix C. 


We now write the action of the maps on columns in the chain 


*B PA 
ee earexal — yee eed — eee 


explicitly using the formula (1'): 


On the other hand, 


Comparing these expressions and recalling that the G2 he etevarbitrary 


real numbers, we arrive at the relations 


83 


a rs eR S |S See (7) 


We shall say that the matrix C= ee is obtained by multiplying the matrix A by the 
matrix B. It is customary to write: C= AB. Thus, the product ofan mxs_ matrix 


(a, 


iQ andan sxn matrix (b 


kp is, by definition, the mx a matrix whose entries 


cj are given by (7). We have proved 


THEOREM 2. The product can , of two linear maps with matrices A and B 


is the linear map with matrix C=AB. In other words, 
IT Ti ey 

Relation (8) is a natural addition to relation (6). 

We can forget about linear maps and find the product AB of any two matrices A 
and B, if, however, we keep in mind that the symbol AB only makes sense when the 
number of columns in A equals the number of rows in B. Under this condition, rule 
(7) saysto “multiply the i-th rowin A bythe j-th columnin B to get the 
ij-entry in AB": 

- ioe ba; = A.B) : (9) 
Note that the number of rows in AB is equal to the number of rows in A, and the 
number of columns in AB is equal to the number of columns in B. In particular, the 
product of square matrices of the same size is always defined, and the product is another 
square matrix of the same size. 


Even in the case of square matrices, the order in which we multiply two matrices is 


important, since, in general AB # BA, as we see from the following example: 
ie oof . fo of, fo of _ [29] [2 3) 
0 Of} |]1 OF §=f0 0 1 O07 Fl OF fo oO 
Of course, it would have been possible to define matrix multiplication in many other 


ways (for example, multiplying corresponding entries in matrices of the same size), but no 


other way can compare in importance with the above type of matrix multiplication, This is 


84 


not surprising, since we arrived at it by studying the composition of linear Maps, and the 


concept of maps and composition of maps is one of the most fundamental in mathematics. 


COROLLARY. Matrix multiplication is associative: 


A(BC) = (AB)C . 


Proof. Matrix multiplication corresponds to composition of the corresponding 
linear maps (Theorem 2 and relation (8)), and, by Theorem 1 of 85 Ch. 1, composition 
of any maps is associative. It is also possible to prove associativity of matrix multiplication 


by a direct computation, using (7). oO 


3. Square matrices. Let M,(R) (or simply M,) denote the set of all square 
nxn matrices (2,,) with real entries aj A 


: : n a , 
The identity e A IR =~ IR , which takes every column X ¢ R" to itself, 


IR 


obviously corresponds to the matrix 


1 0 soo © 
1 = HO 2 3. 0 
0 0 a 
We can write E = (6,,) » where 
Peet whee 
§ = 
ki 
TE tee, 


is the Kronecker symbol. The rule for matrix multiplication (7), with bs replaced by 
J 
by; » shows that 
EA=A=AE , YAe M_(R) 4 (10) 
Of course, the matrix relations (10) » which we verified by computation, follow from the 
relations e~ = =e for any map © (see subsection 2 of 85 Ch, 1), if we use 
Theorem 1 and the equality (8) with Cos ©, = SE 


We know that matrices in M(B) can be multiplied by scalars: if A = (a..), then 
1) 


,A is the matrix (, a4) (see (5)). But scalar multiplication can be considered as a 


85 


special case of matrix multiplication, since 


LA = diag Q)*A= A diag), (11) 
where 

10) 0 

Oe 5\ 0 


diag (4) = XE = 


is the scalar matrix we have encountered before (see 83 Ch. 1). 
An easy verification shows that, as claimed in (11), diag (,) commutes with any 
n 


matrix A. The following converse is very important in applications. 


THEOREM 3. A matrix in M, which commutes with every matrixin M must 
coy Rea Oe BVEIECIIEG OMIUIME LE SEWALD RCM Lyallla CEST We es 


be a scalar matrix. 


Proof. Define ere tobe the nxXn matrixhaving 1 at the intersection of the 
i-th row and j-th columnand O everywhere else. If Z = (2,0) is a matrix which 


commutes with every matrix in M. , then, in particular, it commutes with all of the 2 


0 CA a 0 0) Omer 0 

li 
ORY. Zo, tee 0 ie aes . 

: and jl “j2 jn|} (i) 
O aan tee ene 0 @ s65 0 

ni 


(j) 
where the matrix on the left has all zero columns except for the j-th, and the matrix on 
the right has all zero rows except for the i-th. Equating these matrices immediately leads 
to the relations 244 = 0 for k#i and z= eg . Letting i and j vary, we 
conclude that Z must be a scalar matrix. a 


We further note the relations }(AB) = (,A)B = AQ, B), which follow immediately 


from the definition of matrix multiplication and scalar multiplication of matrices, or, 


86 


if we prefer, from (11) and the associativity of matrix multiplication. 
Given a matrix Ae M,(®) » wecantry to finda matrix Be MR) such that 
AB = E = BA 6 (12) 
lf such a matrix B exists, then (12) translates into the following relation for the 
corresponding linear maps: 
Og: = ee es (12") 


= dl 
A 


exists if and only if oF is a bijective map. In that case e,° is uniquely 


In other words, , = is the inverse of 4 . According to Theorem 2 of §5 of 


all 
Ga, il, oF 
determined. Since (9) = 0, bijectivity of ®, means that, in particular, 


X#0, XeR" => o,(X) #0 : (13) 


Now let we be a bijective linear map from R" to R" . We already know that 


: 1 : : nae: 
it has an inverse map Pn » but at this point we cannot say that this inverse is a linear 


ee : ; 
map. To see that @, is, in fact, a linear map, we introduce the column-vectors 


= been 7 zie = ee 
O, (x RCS Sie CS ae 


x 


=A 


Z A 


-] ; } 
B18) <3, TE) 


and apply 5a) to both sides of these equalities. Because DA is linear, we obtain 
Oa(X) = ©, (X +X") - 0X’) - ety 
A Ames AA ANA d 
(Y) = ©. AY) -19,@, (¥) 
OY) = ©, 0 1O,@, (Y’) 


: =! 
Since Pa =e, we have 


© (X) = e(X' +X") - e(X') - e(X") = 0 ; 


G0) = ESP a 


and hence, because of (13), it follows that X and Y are the zero vector. Thus, 

properties (i) and (ii) of subsection 1 are fulfilled, i.e., e, isa linear map. We 
eal 

then have a, = O, for some matrix B. If we rewrite (12') in the form 


Op = Opa (see (8)) and again use Theorem 1 » we arrive at the equality (12). We 


87 


conclude: a matrix Ae M_(R) has an inverse if and only if the map Pa : R" = R" is 


bijective. In that case the map , is linear. 
Bijectivity of on is equivalent to the condition that every column-vector Y ¢ R" 


can be written in a unique way in the form (1) 


So) = ae oA) eee za? 


(1) 


n 
Wien Nady 7 400 5 al ) are the columns of A. (Surjectivity of , ensures the 


existence of X forwhich Y = © (X) , and injectivity of Ps gives uniqueness of X: 
if we had Y = © (X’) = p(X ) , then we would have p(X Se) Ss p(X ) 9 ,(X reo. 


and so, by (13), X' - X" = 0.) Hence, when can is bijective, IR" coincides with the 


column space V MA) = oa) see, Aly ; thus, rank A = dim R =n. 


, ara : ae : : all 
If A has an inverse, then it is unique: it is the matrix corresponding to Pa: It 


oe 1 
is customary to denote this inverse A. Then we have 


LA = Ido} -1] Ma (14) 


-] 7 ; 
A square matrix A whichhas aninverse A is called non-singular, The corresponding 
linear map Py is also called non-singular, Otherwise they are called singular. 


We summarize our results in the following theorem. 


THEOREM 4. An nxn square matrix <A_ is non-singular if and only if it has 


-1 Peas Bee as 
rank n. In that case the inverse map Os of Oa is linear and is given by (14). cl 


; : : Lil 
COROLLARY. If o, is non-singular, then sois © _), and (A) =A. 


A 


If A,B,---, C,D arenon-singular nxn matrices, then the product AB---CD is 


all =H ete ee 


also non-singular, and (AB. GD) = =D CG 


One can prove this either using the corollary to Theorem 2 of 85 Ch. 1, or else 


=I -1 
using the symmetry of the condition A A = 


ies] 
iT 

> 
> 
Oo 


pola. : 
We shall give an explicit formula for A in Chapter 3. Here we merely mention 


88 


that the actual computation of aoe (or even multiplication of two matrices), say by the 
method at the end of this chapter, usually requires a large number of arithmetical operations. 
in some applications one encounters matrices of size 100x100 and larger. If A and B 
aretwo nxn matrices, then the computation of C = AB involves finding n? entries 
Ci using the formula (7) (or (9)), and each of these ne computations requires 2n-1 
multiplications or additions. Thus, (2n-1) n? operations are required in all, i.e., almost 
two million operations for n=100. This is still an easy problem for a modern computer, 
but what if we want to find the m-th power A” with =m > 1000 ? Here, by definition, 


m ell : : : Seer 
Ao = AA™ , and, infact, it follows easily from associativity (see the corollary to 


k alk 
Theorem 2) that A” = 4 a if O<k<m. To compute A™ one can use various 


additional devices which come‘from either a general study of linear algebra or from some 


special knowledge of the nature of the matrix A. We shal] give three examples. 


Example 1. If 


a, 0 
A = diag{@,,---,a@} = . : 
0 a 
n 
then obviously 
a sues ae 
: m m 
A = diagia, , Seb Pox 
0 a 
n 
Example 2, Let 
ac 
A= 
0 b 
Then, using induction on m, one can show that 
ee cei 
Pa a-b 
a 


89 


Hee imal 2 
= a zs jag m-2 m-1 
where See ae Gl b+---+ab +b . Inparticular, if a=b, we 
have 
m : 
ac a ma™ ve 
0 a 0 a 


Example 3. Using induction on m, 


of the matrix 


it is not hard to show that the 


m-th power 


0 1 
A= 
at 
has the form 
tyne oH 
Ne ’ (15) 
f f 
m. m+1 
where the integers fy =, fy =l, f, =l, f, =2,--- are defined recursively by: 
fe Si ait . These are precisely the Fibonacci numbers (see example 2 at the end 
m+1 m m-1l 
oH Gey Ging Dye 
We introduce the matrix 
ee! 
5 3 
B = 
-f54, V5 


(see §4 Ch. 1), where 


with determinant 1 
_ eas 
AW =e 9 f) Ao 
A simple computation shows that 
a i 
4 V5 “35 
B = and 
oD) 
(By. Soe 
il 5 


ees 

ey 

Aaa ° *B . 
0 ho 


90 


But ifthree nxn matrices A, B, C, with B non-singular, are connected by the 


relationship A =B CB » then 
a™ = B ‘cp-B lop. cp... pcp = plc™s 


-l Fi 
(the inner factors BB equal E, and so "cancel"), In our case, using example 1 and 


the relation (15), we have 


m m 
Lae f call, © _a 0 
m-l m Sasa o B=} Falieee 
m m+l1 ho “| 
,m 
eS i m 2 
ye 3 mg eres Sere a 
” il B= 5 of _ 
~ ae * m+] = i 
V5 oD ho 5 m+1 ho “V5, 15 
MF i 5 
1 m m 
ian tee) 
V5 
_ * 


(the stars stand for entries which we are not interested in), 
Comparing the upper right entry in the first and last matrices in this sequence of 


equalities, we obtain the following formula for the m-th Fibonacci number: 


2 Ep ay 


i m 
We See that fa i My forlarge m (i.e, , the Fibonacci numbers start to look like a 
5 


; c : : 1 
geometric progression), since lim 
M7 © 


We have obtained quite a few rules for working with square matrices: VSM, . VSM, 
(see the remark at the end of subsection 1), associativity (the corollary to Theorem 2), the 


relation (10), and Theorem 4. We further note the so-called distributive laws: 
ee ES BENS) 


(A+B)C = AC+ BC, C(A+B)=CA+CB , (16) 


91 


where A, B, C are any matrices in MCR) . To verify this, we set A=(a,.), B=(b..), 
y lj 


Cs (ce) . Using the distributive law for IR, we see that for any i,j =1,-.--,n, we 
have 
(aaa by )ice. = Elia Co AP Desc 
a ik ik’ kj 22, ik “kj 2s ik “kj 


The left side of this equality is the entry inthe i-th rowand j-th column of the matrix 
(A +B)C , and the two sums on the right are the (i,j)-entries inthe matrices AC and 
BC , respectively. The second distributive law in (16) is verified in exactly the same way. 
(We really do have to verify both laws, since multiplication is not commutative in MCR) =) 


The distributive laws 
(+WE=pE+UE , EM+d) = Et Ev (16') 


for linear maps »,W,§ from R" to R’ follow immediately from (16), because of 
the correspondence between maps and matrices. Alternately, we could have first proved 


(16'), using the following sequence of equalities, and then derived (16) from (16'): 


(gp + WE)(X) = @ + HEX) = O(EX) + WEX) = 


n 


(@&)(X) + (WWE(X) = (E+ H8)(X) , XeR 


EXERCISES 


1. Which of the following maps are linear? 
a) ne ete — [x ycte 9 Xys%)5 
2 ire 
b) i pace = Bake sa: 


c) CX) sXo000° ; x] |> [xX tXoo0t $3, 2 xg 0 oe aoe : 


2. Prove that 


al 
In particular, ad-be =1=> A = 


Does A 1 exist if ad-bc = 0? 


3. Prove that any matrix 


satisfies the relation 
2 
A = (a+d)A-(ad-bc)E 
2 
(in other words, A isa "root" of the quadratic equation x -(a+d)x +(ad - leye)) = C)} , 
4, If ad-be # 0, use the relation in Exercise 3 to find the inverse matrix ie 3 


5. Prove that 


i 2° @ pe 1 ma mim) ab + me 
0 1 b = |i mb 
0 0 1 0 0 1 


iL 4 © 
Find the inverse of the matrix ORS 
00 1 


6. Verify that 


QO -1 : - 5 
1 -1 
7. Prove that if 
m 2D 
Bi Js) a b 
. ot O , then - | = 0 


8. Matrices of the following form, called Markov or stochastic matrices, play an 


important role in applications; 


ing 
Be Dee eau 9 eae F ) = Uy Ay 908 2 in 


The linear maps ©p corresponding to Markov matrices are usually applied to the following 


special type of "probability" column-vectors: 


93 


These definitions come from problems in the natural sciences. The compatibility of the 
definitions with one another is clear from the following assertions, which the reader should 
prove at least in the case n=2. 

(a) A matrix Pe M,(R) is Markov if and only if PX isa probabilityvector when- 


ever X isa probability vector (where we denote PX = p(X). 


(b) If P isa positive Markov matrix (i.e., Py > 0, Vi,j), then every 
probability vector X corresponds to a positive probability vector PX (all of whose 


components are strictly positive). 


(c) If P and Q are Markov matrices, then PQ is also Markov. In particular, 


k ; 
any power P of a Markov matrix is Markov. 


84, The space of solutions 


1. Solving a homogeneous linear system. It follows from the introductory remarks 


at the beginning of §§2,3 thatan mXn system of equations with matrix A and column 
of freeterms Be ness can be briefly written in the form 

@,OO = B , (1) 
or 

1c se (1") 
(the left side is the product ofan mXn matrix andan nx 1 = matrix). 
Suppose for a minute that m-=n and the square matrix A_ is non-singular (see 
subsection 3 of §3). We then obtain a solution of (1), in fact a unique solution, if we 
-1 -1 -1 

multiply both sides of the matrix equation A on the left! X=EX=(A A)X=A (AX)= 
= AS . This convenient notation for expressing the solution of the square system does not 
free us from having to make computations, since the matrix A is not given to us in 


advance. But we should permit ourselves to take some satisfaction, at least aesthetic, in 


94 


this use of the matrix apparatus developed in 83. We now make use of this apparatus to 
study all solutions of the system (1) in the general case. We start by considering the 
associated homogeneous system, obtained by setting B = [@,0,c05, Ol = O, 


By the kernel of a linear map Py! R= R™ we mean the set 
n 
= = 0 
Kero, = {XeR lp (0) } 


In other words, Ker Ps is the set of solutions of the homogeneous system with matrix A. 
In fact, Ker Ps is actually a subspace of R" (called the space of solutions of the 


homogeneous linear system), as we already noted at the beginning of &1, andas easily 


follows from the linearity of the map Son : 
a 5 Xe @ Kerg, => © (axX’ ap flex )) = 
& 1 tt = 0 =—> i tt 
cep ,(X') + Boo (X") @X' + BX e Kerg, 


Next, we note that the image Im ©, of the map CaN is a subspace of eee 8 she 


BY = © ,(X') and B" = © (X") » then we have 
aB' + BB = M9 CX") + Bo (X") = © (aX + BX") € Img, é 
To say that the system (1) is compatible is equivalent to saying that Be Im Oy o Jue 
s = dim on » r=dim Pe 


be the dimensions of the spaces Ker Da and Im Pa - It follows from the definition of 


dimension in §1 that 5 <n,r<m. Wealsohave r <4, since a linearly independent 


1 k 
set 0 4(X » see, © (X' ) in Im ©, can only be obtained if the set x) BOO 4 xX) 
is linearly independent in IR" - We obtain more precise information from the following 


theorem. 


THEOREM 1. The equality r+s=n holds. Furthermore the number 
eg eee se 


r= dim Im Pa coincides with the rank of the matrix A (and so is called the rank of the 
ee te Tank of the matrix Se Os Wale 


linear map 0,4) : 


95 


: I 
Proof. We choose a basis x! a sae x's) of the subspace Ker On < RR" and 


YO goed (n) 


complete it to a basis Oye HOOD fos of the entire space R° . This 


can always be done, as shown inthe proof of Theorem 2 of §1 (and Exercise 5 of §1). 


For any vector X Ss a x) € IR" we have 
i 


©,(%) = Yao,” 2a OKT Mee ta) , 
and hence Im O,= oC aaee vee, 0 (x) , sothat r<n-s. The vectors 
Bk, ay © (x) are linearly independent, since if we have 
Os Ss . © (x) =O, ( yD On. 5) then this means that 
k>st+l k>s+1 


(n) 


1 
or dy +. X*”, this is only possible if 


3 


k 
3 a, x! ye Kero, , but, by eur choice of X 
k>st+l 


Serie a Cn 2 Thus, r=n-s. Next, by the definition of On» if 


X= Lx) 5 sen, xd; then we have 
il P 1 
X= age eal”, les a eee 
1 : 5 
But the dimension of the linear span ca! - see Nee of the columns of A_ is nothing 


other than the rank of A. DO 


We have already encountered a special case of Theorem 1: if A isa non-singular 
square matrix of order no, then A and On both have the maximal possible rank n. 

In order to find a basis for the space of solutions of a homogeneous linear system 
AX =O of rank r, wechoose r_ basis columns in A (a practical method for doing this 


will be given in the next chapter). If we permute the columns (equivalently, re-index the 


= (1) Akt) Bae Cal 
unknowns), we may assume that the first I columns A’ °.+::, are basis columns. 


Any set of r+1 columns ae yet ed) , A”, k>r, is then linearly dependent, 


and, by Theorem l(v) of §1, we can write a system of relations 


96 


ee + pe treet Ee Al a =G 


, 


kare lor 2 


The n-r_ column-vectors 


x) = (De ieee Oat Wes 5 10.) ’ 
1 2 ity 
he) _ (r+2)  (r+2) (r+2) 
x See 1 Xo p ORO 9 a. Wl oan 5 10) ) (2) 
ea ae ne OPO je Ll 


are obviously linearly independent (because of the special form of the last nm-r components). 
Since these column-vectors are solutions of the homogeneous system associated to (1), it 
follows by Theorem 1] that they are a basis of the space Ker Pa of all solutions. 

Any basis for the oe of solutions of a homogeneous linear system AX=0 is 
called a fundamental system of solutions. A set of vectors of the form (2) is called a 
normalized fundamental system. According to the corollary to Theorem 1 of §2, the rank 
of this system s=dim Im Cia a is equal to the number of free variables in the linear 
system. 

The following assertion (which we shall not need later) reveals a certain "geometrical" 


interpretation for systems of linear equations. 


THEOREM 2, Every subspace VC R" of dimension s is the space of solutions 


for some homogeneous linear system of rank r=n-s, 


1 
Proof. Let V = cas y. see, a’) . As in the proof of Theorem 1, we complete 


ad 40) a. gl) gle) 


(n 
oie ’ Fy a 


the linearly independent set + ,A to a basis 


; n 
of the entire space IR. Any column-vector X = [x oy 8. |} ie can be written 
n 


es 


in a unique way in the form 


cS oy x, AD 2 axe, (3) 


where A isthe nXn_ square matrix made up of the columns al) » and 


97 


(1) (n) 


k' = [x' yore, A are linearly independent, 


ie Xo S00 § x J. Since the columns A 

it follows that rank A =n, and, by Theorem4 of §3, the matrix A has an inverse 
=ll = 

A = (a) . We have 


n n 
6 ; zl 2 = 
a ete x-| ) Cig aes ) Bes 
=a jaa ad 


The corollary to Theorem 4 of $3 shows that rank at =n, andhence any r_ rows 


i : 4 
of A are linearly independent. Thus, the homogeneous system 


9D, Fgh =e ee Se ee ey 


has rank r=n-s. But the set of solutions of this system consists precisely of those 
column-vectors X ofthe form (3) for which Xo4 =0,---, x' =0, i.e., it consists 


precisely of the vectors in the subspace V. Oo 


2. Linear manifolds. Solving a non-homogeneous system. Let V_ bea subspace 


0 
of R° , andlet X bea fixed vector in IR". The set 


Wao Soa = ye ane 


is called a linear manifold of type V anddimension dim V. The geometrical picture 


illustrates what should be an intuitively clear notion: V+X isthe space V translated 


(shifted) by the vector x? . The subspace V_ itself is also a linear manifold, corre- 


sponding to translation by the vector eo = 0. Two linear manifolds coincide if and only if 
they are obtained by translation by vectors X' and X" suchthat X'-X"eV_ (the 


verification is left to the reader). In particular, if X' is any vector in the linear manifold 


ve then V+X' coincides with V+X 


98 


1 3 o 0 
For example, let Vacs ) 2p) Bf Ye im, = (O,0,1, 1,0], 
x' = [0,0,0,1,0]. Then 


0 
V+tX =V+x' = {[x,y,z,1,0)|x,y,z€ BR} 


We now turn to the non-homogeneous system of linear equations (1). Suppose that 


(1) is compatible, i.e., by Theorem 2 of 82, the rankof A andthe rankof (A |B) 


0 


0 
coincide. Let X = Lx, O90 ¢ 


0 
x] be a fixed solution of the system, so that O(X ye 18 2 


If X' is any other solution of (1), then © 4(X'-X°) = P(X") = © (X) Soba OW. jalemec, 
the difference X" = X' - x? of two solutions to (1) is always a solution to the corre- 
sponding homogeneous system, and X' =X” x . Conversely, if 04 (%) — OP then 
0 0 
o(XtX)=o (XY) +o (X) = 0+B=8 
We thus have the following assertion, 
THEOREM 3. ‘The solutions of a compatible non-homogeneous linear system form a 


linear manifold of type V, where V= Ker On is the linear subspace of solutions to the 


corresponding homogeneous system. ia 


3. The rank of a product of matrices. The action of a product T--- cm of maps 


can be roughly depicted by the diagram 


— 
~ 
XN 
N 
x 
N 
>» 
we lt 
7 I 
a | 
J : i 
1 i | H ! 
4 I H i : 
i ! ! { \ 
? w t Q ' fet i t 3 
U—S V-——) W-- 5 T 


which in the case of linear maps of vector spaces is meant to convey the implication: 


py(U) © p(V) => rankoy < rankp . Furthermore, a basis of y(U) maps onto a 


set of vertors containing a basis of my (U) , and hence 


99 


rank oy < rank % 
Thus, 


rankoy < min{rankg, rank wt 


But rank Dee rank A and rank AB= rank Pape rank Py Op hence, the inequality (4) 


leads to the following useful fact. 


THEOREM 4. The rank of a product of matrices is less than or equal to the rank of 
eet 8 eS tan or equal to the rank of 


each factor: 


rank AB < min{rank A , rank B} : oO (4') 


COROLLARY 1. If B and C are non-singular square matrices of order m 


and n, respectively, andif A isany mxXn_ matrix, then 


rank BAG y= rank; A 


Proof. By Theorem 4, we have 
rank BAC < rank BA = 
~All =all 
= rank BA(CC ) = rank (BAC)C <rankBAC , 


so that rank BAC=rank BA. We similarly prove that rank BA=rankA. oO 


COROLLARY 2. A square matrix which has a left or right inverse is non-singular. 


Proof. Suppose that AB=E, where A and B are nxn _ matrices. Since 
rank E=n, inequality (4') can be rewritten in the form n< min{ rank A, rank B}, 
which implies that rank A=rankB=n. But this is equivalentto A and B_ being non- 


singular (see Theorem 4 of §3). We similarly show that A_ is non-singular when there 


exists C forwhich CA=E. (a 


a n i 4 
According to Corollary 2, a linear map Py! IR ~ IR which has either a left or 


right inverse must have a two-sided inverse. This is an example of a basic distinction 


between linear maps and general set maps (see Exercise 2 of §5 Ch. 1). 


4, Equivalence classes of matrices. As in subsection 3 of §3, we let Bee 


100 


denote the mxXm_ matrix having a one at the intersection of the s-th rowand t-th 
column and zeros everywhere else. 


We further introduce the so-called elementary matrices in MR) : 


(1) F  =E-E_-E_ +E +E, = 
s,t ss tt st ts 
1 
Ong o a a oll 
= : 1 , S#t; 
lee eo ec ae) 
i 
i 
I = = ils 6 o dls i\co 3 
(II) Ee E+\E. L.-+) SE eis 
i 
(11) FQ) = E+Q.-DE.. = diag{1,---,1,,1,---, 1} ’ 1 #0 


Let A be any mxXn matrix. One easily checks thatif F is one of the first 
two types of matrix, then the matrix A' = FA is obtainedfrom A _ by an elementary 
transformation of the rows of type (I) or (II), depending on whether F = Moe or 
ie Feit) » vespectively. [If F= FQ) , We shall speak of an elementary transformation 
of type (III) (multiplying the s-th row Se by ,). Similarly, the matrix A" = AF 
is obtained from A_ by an elementary transformation on the columns. From subsection 


2 of §2 and Exercise 2 of §2, we know that a matrix A canbe reduced to diagonal 


form by elementary transformations of type (I) and (II) on the rows and columns. Since 


101 


fe] 
a 


all = F(a,)F(a,)- o- Fa) 1 5 


\o aul" 


it follows that, if we allow elementary transformations of type (II), we can reduce A toa 


matrix of the form 
ie 
(5) 


(the zeros here denote matrices of size rxX(n-r),(m-r)Xr, and (m-r)xX(n-n)). 
Thus, 


Big 23s One aes Gg. = F (6) 


where P. and SF are elementary matrices of order m and n, respectively. We 
have often stated that the elementary operations are invertible. This agrees with the fact 
that the inverse matrices exist and are matrices of the same type: 


=i -] =all cul 
Oe er ae ee By) SE Oe ee 


By the corollary to Theorem 4 of §3, the matrices P= Po Pod see PL and 


“1-1 ol eel -1-1 
B20 705° o. also have inverses: P =P, woo Jetje 5 =o. Bees Q : 


s-ls 
ell el : 
Note that the P. and Q; are elementary matrices. 
We calltwo mxn matrices A and B_ equivalent and write A ~ B if there 


exist non-singular matrices P and Q ofsize mxm and nxn, respectively, such 


that B=PAQ. 
It is easy to see that ~ is an equivalence relation: (i) A~A (choose P= En ; 


= =A i? , ; 
Q=E)s (ii) A~B=> B~A, since B=PAQ => A=P BQ ; (iii) B= P' AQ 


102 


and C= P"BQ" => C=PAQ, where P=P"P’, Q=Q'Q". As always in the case of an 
equivalence relation (see §6 of Ch. 1), the set ofall mxa matrices splits up into a 
disjoint union of equivalence classes. Since the ranks of equivalent matrices are equal (see 
Corollary 1 of Theorem 4), the argument leading to (6) shows that the matrices of the 
form (5) can be takea as representatives of equivalence classes. We have obtained the 


following fact. 


THEOREM 5. The set of mxn_ matrices divides up into min(m,n)+1 equiv- 


alence classes. All matrices of rank r lie in the same class as the matrix (5). oO 


COROLLARY. Every non-singular nxn matrix is a product of elementary 


matrices, 


Proof. Any such matrix is equivalent to the identity matrix, since both have rank no. 


Then the relation (6) 


when written in the form 
A= B60 eon 
Bi Eel Ps Q, Q» Q) : m7) 
gives the corollary. fal 


The corollary does not say that the representation of A as a product of elementary 
matrices is unique, but even the fact that such a representation exists is very useful. For 
example, such a representation can be used to find the inverse matrix. Namely, (7) gives 
us: 


A = aaa ond = 
2725 Q fy oer a as 


: =, ; : 
To find A, we first multiply A on the left by P to reduce it to triangular form. As 
we do this, we keep track of the sequence of matrices E, Py »P,P) »*°* . Once we have A 


reduced to the triangular matrix Po ee Py PLA , we start multiplying on the right by the 


oF to reduce it to the identity matrix, all the time keeping track of the sequence of matrices 


103 


1 Qy Q) Q, ,’¢* . Since each matrix P. or Q; merely corresponds to an elementary 
transformation, it follows that the chain of transformations performed simultaneously on 


E and A_ is not hard to carry out: 
nia — pa ce p fp goo. je, ss 
Ee ee a 
s it ue P AQ 1Q, s 


*_— Pp oe . cee soe 
s 1 "9 eo 0,1, Q 


(although there may be a large number s+t of steps). The wavy lines separate the 
products we are interested in: the result of applying elementary row transformations on the 
identity matrix (in the first part), and the result of applying elementary column transformations 
on the identity matrix (in the second part), If it turns out that r<m, weconclude that A 


is a singular matrix and so does not have an inverse. If r=n, we need only multiply 


? 
el 
Q and P inordertoobtain A. Note that the order of performing the row and column 


operations can be reversed. 


We consider two examples. For the matrix 


ves 
A= 145 6 
789 
we have 
[bee Toes 
Eee |GeoeOlle F en (a4), (0 oe sai 
I7 89 , 78 9 
oe 
—> F. (-7)F. .(-4)$}]0 -3 -6) = 
3, 1 a aes 
laa 
es 29) 0b -7)+F -4 0 -3 -6 
ao! 2) 3,1! ) 2,1! ) pee 


Since the right side has a matrix in step form of rank 2, we must have rank A=2. 
Hence, <A does not have an inverse. 


Here is another example. From a sequence 


104 


0001 1000 1000 
1000 0001 0100 
—> _> 
Fillo 10 0] @ 1,22 1011 G.0\) mo “o0a2 to lononoem 
0010 0010 0010 
1000 
0100 
os oon O 
0001 
we find that 
=| it 
0001 01:00 
1000 0010 
0100 SOR ee = ie a 1 
0010 1000 


This method, sometimes called (P,Q)-reduction of a matrix to the normal form (5), 
is rather convenient for computing inverse matrices; however, it is premature to speak now 
of the advantages and disadvantages of this method, since our examples did not even use all 


of the types of matrices ee 5 a ; FA) . 


EXERCISES 


1. Prove the following rules for working with transpose matrices (see Exercise 1 of 
§2): 
t t 
US = hee 


(AB) = Be A 


2. Prove that rank AB < min{ rank A, rank B} by a direct argument with matrices. 


3. Prove Sylvester's inequality: 


dim Keroy < dim Kerg + dim Kery 


for any two linear maps R° = ie Stes R’ F 


. n m 
4. Prove that every linear map »:R ~ IR of rank r can be written as a sum 


Se 100) sat of maps %, of rankl. 


105 


5. Find the rank of the matrix 


HV) Xq --- 


n 
Se ae 

iN 
; yy x2 Xn 


6. Show that the inverse of a matrix can be found by applying elementary trans- 


formations only to the rows (or only to the columns) of nia 4 


Chapter 3. Determinants 


The formulas (3) and (9) in 84 Ch. 1 for solving square linear systems of order 
n= 2,3 suggest the possibility of similar formulas for any n. In essence, what we need 
in order to generalize these formulas is a good interpretation of the numerators and 
denominators in those formulas. We shall show how to regard them as the values of a 
"universal" function det: M(B) ~ IR fromthe set of nxn matricesto R. The 
construction of the determinant function det will also answer many other questions 
concerning matrices which were raised in Chapter 2. In fact, the theory of determinants 
has much wider applications than those which we shall touch upon, and each application of 
this theory suggests a different possible method for constructing the determinant. One of 
the most natural approaches is the geometrical approach, which considers the determinant 
of a matrix as the volume of a multi-dimensional figure (see Exercise 3 of 84 Ch. 1). 


This method, based on the notion of exterior n-forms , would require us to go further into 


107 
geometry, So we shall stick with an "analytic" approach. * 
§1. Determinants: construction and basic properties 


1. Construction by induction. We take the determinant ofa 1 xX 1 matrix (a,) 


to be the number The determinants of 2x2 and 3x3 matrices are defined by 


aii: 
formulas (2) and (8) of 84 Ch. 1. In the formula for the determinant of a 3x3 matrix, 
the 2x2 determinants which occur were intentionally left "un-expanded", in order to hint 
at the induction which we shall use to construct the determinant of an nxn _ matrix. 


Suppose that we have already introduced determinants for matrices of order 


1,2,+-+, 2-1. We define the determinant of an nxn matrix A = (a, to be 
i 


= 2 het wala , 
D=a),D, ay,D, + + (- 1) ai Pn ; (1') 


where Dd. is the determinant of the (n- 1)x(n-1) matrix 


12 In 
4-1, 2 Akela 
2 
*k+1,2 *k+1,n 
| : oa 
| 
7n2 Ann 


which is obtained from A _ by crossing out the first column andthe k-th row. 
It is easy to see that the expression (1') inthe case n=2,3 agrees with the 


formulas (2) and (8) of §4 Ch. 1. The determinant of a matrix A = can be denoted 
i 


jock det A. The vertical lines are used primarily when the matrix 


Jal, lay, 


A has been given to us explicitly. 


If we cross out the i-th rowand j-th columnina matrix A, we obtaina 


; There are many analytic methods for developing the theory of determinants. In this 


chapter, as in §4 of Ch. 1, we shall follow Shafarevich's lecture notes (Moscow University, 
1971). In the first place, the practice in the use of induction is useful for its own sake, 
Moreover, the most useful methods for computing determinants are developed quickly. On 
the other hand, approaches based on "completely expanding the determinant" (see §3 Ch. 4) 
are in some sense simpler. 


108 


square matrix of order n-1. The determinant of this (n - 1)x(n-1) matrix is 
denoted eT and is called the minor of A corresponding to the entry a . Using this 


notation, (1') can be written 


M foe (an a M : (1) 


det A =a_..M al nl 


Hines eke 


This formula for the determinant can be expressed in words as follows: the determinant of an 


nxn matrix is the sum of the products of an element in the first column by the 
corresponding minor, where the products are taken with alternating sign. 


If we take the k-th column instead of the first, and replace the minors Mi by 


the minors Mi, , then, as we shall see later, the resulting expression differs from 
det A at most by a sign. 


In what follows, we adopt the convention of Chapter 2 that 


is] 

~~ 
- 
i} 
Be 
bo 
=] 


AO) 5 


It 
a 


ly? 212 eat J ’ 
denote the i-th rowandthe j-th column, respectively, of A= (a, . The matrix A 


can be represented either by listing its rows (i.e., as a column of rows) 


or by listing its columns (i.e., as a row of columns) 


Ae 


oO 
We shall sometimes refer to the rows and columns of an nxXn matrix A asthe rows 
and columns of the n-th order determinant lai, | g 

By definition, | | = det isa function which associates toa matrix A a number 
[A | = det A. Our first task is to study the behavior of this function when we change the 
rows or columns of A, considered as elements (vectors) of the vector space R. It is 


sometimes useful to think of det A asa function 


det[A j++, A] or deca) sea) 


109 


of n variables which are vectors in IR" - Functions of n variables were introduced 
in subsection 2 of §5 Ch. 1; such a function can be thought of either as a function of n 
variables x, € X or as a function of one variable xe¢X". In our example here, 

jie det and |) a= R° . We now discuss some general propertieswhich a function 3 of 

n variables Ay eR” may have. 


We shall calla function 9: [A,; tee, A] »& H(A), tee, A) multi linear if it is 


linear in each argument A, 5 les 


H(A,s ++ ’ aA: pe res 5) A) = 


= an(A,, tee y Ais see, A) + BA), see, ay , a) 
(compare with subsection lof 83 Ch. 2). Such a multilinear function is called skew- 
symmetric if 


Coda. de) 


oe ; a coh) 5 Uesiisem = jl, 


ae 


Remark 1. ‘The definition of a linear function (see (4) of §3 Ch. 2) tells us that 


a function ® is multilinear if and only if, for fixed Apt Arpt AL and for 
A, =Xe= Gere 2 5 x) we have 
H(A), Om 6 A) =X, t GX, tees t+UX , 


where @,---, o are scalars which do not depend on x) pow +5 a : 


1 


Remark 2. A semi-linear function § is skew-symmetric if and only if the following 


equality holds for all X: 


De en ee ye ee eS a ae (2') 


To see this, in one direction, if we set A= Aad = X in (2), we arrive at (2'). 
Conversely, setting X= A, + Avay in (2') and using the multilinearity of ©, we 
i 


obtain 


110 


BC Ag Ay eee eee jee 
+ NGs>, A AN +-) = BC pA, =<) = 
Segoe ee een 
oe) eae ea 


The first two terms vanish (as we see by setting X= A and X= Avy ine) sehencer 


1 
the sum of the last two terms is zero, and this gives us (2). 


(1) 


The same definitions and remarks carry over to functions $(A‘“°,---+, A 
column-vectors. (Note that the skew-symmetry condition (2) makes sense for a function 
n 
®:M - IR froma cartesian power of any set M.) 


We shall later need: 


. 


LEMMA 1. If any two arguments of a skew-symmetric function are interchanged, 


the value of the function changes sign. 


Proof. Suppose we interchange the i-th and j-th arguments, where i<j. 
We proceed by induction on the number k = j-i-1 of arguments between the two we are 
interchanging. If k= 0, the lemma coincides with the definition of a skew-symmetric 


function. Suppose that the lemma is true whenever j-i-1<k. Then 
ee erp a oo5)) = 


= Biss, Ms hie ; Ce ee 


ie oe oie oe Memes) 


itl? 4 area oi, 
Saas See Xipyeree Mya Xp eee) g 


2. Basic properties of determinants, The notion of determinant only becomes 
usable when we derive the properties which are important both from a theoretical and 
computational point of view. 

The trivial relation det(a +b) = deta + detb for first order determinants might 
suggest the mistaken conclusion that it holds for larger n, for example, for second order 
determinants, But if we look at the case n=2, we find that the correct relationship is as 


follows: 


111 


ax! + Bx) ax, + Bx! 


2 
= ax' + tt = t LEG = 
: i. (ox, + Bxy)ay, ~ (ax, + Bxp)a,, 
20 22 
x' x' oe ae 
= Z = i tt tt it 2 l 2 
= A(x = = 
(Ay 9 ~ %2 491) + Bla, - xhay)) = @ ive aia 
Dil 2 2122 
(To see that det(a+b) # deta+detb, take, for example, a=b=E, or a=E, 
=-E.) We also note that 
PT ae e232 1 0 
a a Se a u io uy = } 
Ze? 2. i 


We thus have evidence from the 2.x 2 case for the validity of 


THEOREM 1. 


The function, At det A on the set MCR) has the following 
properties: 


Dl. det A is a multilinear function of the rows of A, i.e., the determinant of 
a_matrix is a linear function of the elements of any row A,. 

D2. det A is a skew-symmetric function of the rows of A, i.e., it vanishes if 
any two neighboring rows coincide. 


IDB, Cele = i, 


Proof. We use inductionon n. 


Properties D1-D3 have been verified when 


n= 1,2. Suppose that they hold for all determinants of order <n. 


We use the formula 
(1) to prove D1-D3 for 


n-th order determinants. We start with D3. 
3 elt 
tO 0 
(Oy ab 4 @) 
I eS = ; 
0 0 il 
then in (1) we have As Ov for 4 Vand oa = 1; hence det E=M 


ne Ss 
determinant M is the determinant of the (n-1)x(n- 1) identity matrix, so, by the 
induction assumption, My, =1, andhence detE=1. 


We shall prove properties Dl and D2 in a somewhat more general situation, which 


112 


is described by the following lemma. 
LEMMA 2. Let 7 : MCR) ~ IR_ be the function defined by the formula 


iol 
ee, = 4, Mi, “yy Mo; teee + (-)) eae . (3) 


Then: 
Dale | is a multilinear function of the rows of A. 
ae . is a skew-symmetric function of the rows of A. 


Proof. De In order to emphasize that we are considering the elements of the 


i-th row to be variables, we set X= a, 5 © = looo, ms 


1s 
il es "An 
75-1,1 Ai-1,j *t1,0 
oe x 
A= 4 j ra ; 
vir 1,1 ay eau 
1 ° ay ; Aan | 


58 and hence a, = (1M, isa 


ia, © j j 


k#i, has (Xp, 00° *,%) as a row, 


The minor ae does not depend on Xprttty 


constant. Any other minor M 


kj’ de ea Ce 


and all of its other rows are constant. By the induction assumption, Mi “is a linear 


function of the variables Xs cee, eet? eee sey x 


My . 2p, css eae 


2b 5 ley Ineianneks I 


: k-1 
Now setting @ = (-1) %. a,:, S#j, Wwe arrive at the expression 
s pats s “kj 
1 


ey 


os k-1 k-1 
(- 1) AL IML. = @ise, db 2 es x € 
ae kj kj il eee kj 2 sks 8 


k-1 = 
@i3%, AF (-1) re a sey 
V3 2/2, “ks 4 | es ss 


113 


which tells us that ye is a linear function of the elements x ; 


p x of the i-th 
rowof A. 


Oe According to Remark 2 of subsection 2 » we may just as well prove that 


§ (A) = 0 for any matrix 


ll lj ln 
x doo 3 ¢ x 
ie i j a 
1 * pes. 
a 4 | 
Hat aaj fan 
with two identical rows A. = Aiud = (x); trey x teeny x) . The minor Bs »k#i, 


i+1, also has two identical rows, namely, the row (x, yore oo a x) of 


et yee 
length n-1. By the induction assumption, Me = 0 for k#i,i+1. The formula 


(3) can now be rewritten in the form 


tell i 
O(A) = (<1) x M+ (“1x M, 
fee Selec ayn te| 


But obviously A = ea wellences 
8 (A) = (- 1x (Mt, Sing gl = Ue Oo 
Setting j=1 in (3) and comparing the resulting expression with (1) , we conclude 
that 
8 (A) Sa Gar  . (4) 


Hence, properties Dl and D2 of determinants are contained inthe lemma. Theorem 1 is 
proved. n 

We write out property D1 in more detail: 

IDL detL A), O00 5 MApytee 5 eel =} detl A), coe yy Alyeeey Al, lee ee 
row Ay in a determinant is multiplied by } , the determinant is also multiplied by }). 
In particular, if all of the rows are multiplied by }, one obtains the formula 


114 


det, A = io det A 


D1". Iffor some i all of the elements in A, are of the form aij = 2 + i , 
then det A = det A‘ + det A”, where = = By = - for j#i and At = (ayer sad), 
Ay = (jaro a 


Theorem 1 implies several other simple assertions which we shall state in the form 
of properties of determinants, but shall prove for any of the functions oH given by (3). 


We know from (4) that the determinant is a special case of a 3 


D4. A determinant with a zero row is equal to zero. 
Suppose that A. = (0,0,.--, 0). Then also 2A, = (0,---, 0). Consequently, 
by Dik 


Bi) 


il 
& 
eas 
_~ 
> 


i 
iS) 
iB 


and hence Se) 


DS. If any two rows (not necessarily neighboring ones) are interchanged, the 


determinant changes sign. 


This property follows from D2 and Lemma 1 for any function ae) 5 0 


D6. If any two rows coincide, then the determinant is zero, 


We again take any function se) . suppose Ag and A. are the rows that 
coincide. If we interchange A, and A , the matrix remains the same. On the other 
hand, according to DS , which we proved for any function a , interchanging these two rows 


changes the sign of the determinant. Thus, cree) = 2 Hey) » andhence 8(A)=0. qo 
j E 


D7. The determinant does not change if an elementary transformation of type (II) 
is performed on its rows. 


Suppose we add } timesthe t-th rowof A tothe s-th rowof A_ to obtain 


a matrix <A’. Then, by properties Dl and D6 (which we provedfor %.), we have 
J 


SN) = eae : A. ap AY cee, A.) = 
= Ges 3 Se) + Ae 5 Aon 300 5 A ae) = 
chy A) = ey) : oO 
‘These properties make it relatively simple to compute an n-th order determinant. 
One method for doing this is as follows. We know that we can reduce a matrix A = ai) 


to triangular form (see §3 of Ch. 1) by elementary row transformations. We thereby 


obtain a matrix 


ll 1 In 
0 a, oe. El 
a 22 2n ; (5) 
male) 0 a 
nn 


Suppose that q elementary transformations of type (1) were used in the reduction process. 
Since elementary transformations of type (II) do not change the determinant (property D7) , 
while each elementary transformation of type (I) multiplies the determinant by (-1), it 


follows that det A = (-1)4 det A. We shall prove that 


Se eer an 


Once we prove this, we shall have the formula 


= = ae a eee a 
det A = (-1) 814499 aan 0 (6) 


The row reduction procedure, combined with this formula, is one method for computing 
det A. 
We prove the formula for det A by inductionon n. Since ayy = oeg A ie 0 


it follows by (1) that det A = a), M, 1» Where 


ae pn 

aaa S 

— 33 3n 
aan 7 

0 0 a 

nn 


is a determinant of order n-1. By the induction assumption, My, = Aoo4g3°°° a 


Hence, det A = any M), Saag San: 
Now, based on (6) , we establish an important fact concerning the role of properties 
D1-D3 of a determinant. Namely, up toa constant factor det is the only function 


satisfying Dl and D2 (and that constant factor is 1 if D3 also holds). 


THEOREM 2. Let &: M(R) ~ IR_ be any function having the following properties: 
(i) (A) isa linear function of the elements of each row of the matrix A € MCR) 2 
(it) when two neighboring rows are interchanged, (A) changes sign (in other 


words, (A) isa multilinear skew-symmetric function of the rows of the matrix). 


Then there exists a constant po independent of A _ such that 
O(A) = p+ det A 


The number p is determined by the equality p = 8(E), where E_ is the identity matrix. 


Proof. By Lemmal, (A) changes sign when any two rows are interchanged, 
i.e., under any elementary transformation of type (I). Furthermore, an argument similar 
to that given in the proof of property D7 shows that (A) does not change if an elementary 
transformation of type (II) is performed on the rows of A. 

Using elementary transformations, we reduce A_ to the triangular form (5) , where, 


of course, some of the ai might equal zero. Using what we now know, we have the two 


formulas 


det A = (-1)l det A = (-1)94 


11422 4 at (gee (6) 


It 


(A) = (- 1)? (A) 


3 


where q_ is the number of elementary transformations of type (I) used to go from A_ to 


A. The required equality j9(A) = p+ det A, where p = 9(E), will then follow if we 


117 


prove the formula 


(A) = 9(E) i ae (7) 


We now prove (7). By condition (i) of the theorem, we canbring a outside 
nn 


the & 
ll “Teel In 
(A) = 2 O( ) 
; es a 
Fael,n-1 [oval fot 
0 0 it 


We now apply an elementary transformation of type (II) to A: foreach i , subtract 
a. times the last row of the matrix after the § onthe right fromthe i-th row of this 
matrix. This changes all of the elements in the last column to zero (except for can = 1), 
but keeps all other elements in the matrix the same. We then move to the second-to-last 
row and proceed in the same manner, and so on. Each time we take the element a 


outside the . After doing this n times, we finally obtain 
Sear aerial alle, 


which is precisely (7). oO 


Thus, properties D1-D3 characterize the det function uniquely. For this reason, 
we consider those three properties to be the most fundamental properties of determinants. 
It would have been possible from the very beginning to define the determinant to be any function 
% having properties D1-D3 , but in that case it would have been necessary to prove the 
existence of such a function. Our approach guaranteed the existence of det, because we 
constructed it by formula (1). 

Because of later applications of Theorem 2, we did not insist on the normalization 


Q(E) = 1 forthe function inthe theorem. 


118 


EXERCISES 


1, Using formula (1) andthe rule for the signs in the expansion of a third order 
determinant (Exercise 1 of §4 Ch. 1), write out completely all of the products occurring 
in the expansion of a fourth order determinant. Take note of the total number of terms in the 


expansion, and also try to find a rule for determining the sign of each term. 


2. There are n terms onthe right in formula (1). Each minor M., 


WL? in turn, 


is equal to a linear combination of n-1 minors of order n-2, andsoon. Altogether 


the expansion of an n-th order determinant aoa consists of n(n-1)...3+*2-l=nl 
("n factorial") products of the form a, ,a, ,... a, , eachwitha + or -. Show 

Toe Lae in 

1 2 n 
that 

n(n - 1) 
2 
eatel) S 411499 sate al ara 1) eh 2 ieee Aaa? : 


3. Using the remarks in the preceding exercise, applied tothe determinant det(a_.) 
i 


j 


of the matrix all of whose entries a equal 1, prove that, in the expansion of an n-th 


order determinant, exactly half of the products a 44 eon a, | occur witha + _ sign. 
i 


4, Write the following skew-symmetric function A: R° ~ IR of three variables 


X,y,2Z inthe form of a third order determinant: 


A(x,y,Z) = (y- x) (2-x) (z-y) 


§2. Further properties of determinants 


1. Expanding the determinant along an arbitrary column. We are now ina position 


to answer the question which naturally arose when we first constructed the function det : 
does the first column play any special role that explains its appearance in formula (1) of 
$1? ‘The answer to this question is given by the following formula, which expresses the 


determinant in terms of the minors obtained from going down any j-th column: 


119 


n 
i+j 
det A = al) as ee 

a, Le 


To prove (1), we apply Theorem 2 of 41 tothe function & in Lemma 2 of sls 
J 
We obtain the relation: 
aie) = He) + det A 
i-1 a 
But, by formula (3) of 81, ee = (- iby . Hence, ee = (-1)) I det A. After 
multiplying both sides of this equation by (- yl » wehave: det A = (- i as QA), 
J 
which is merely another way of writing (1). This formula becomes more symmetrical if 
we introduce the so-called cofactor - a ( ine a of the element a.. inthe matrix 
pede ij 

A. We state our result in the following theorem. 

THEOREM 1. The determinant of a matrix <A _ is equal to the sum of the products 


of each element of a given column by the corresponding cofactor: 


det A = Ali, d@e,  o Il ds 
2) Ai ol) ie 


This theorem shows that all columns can play the same role. If j=1, we obtain 


our original formula (1) of §1, which we used to define the determinant. We say that 


formulas (i) and (2) give the expansion of a determinant along the j-th_ column. 


It is tempting to compare (2) with the analogous expression one obtains if one sums 


over the second index for fixed i: S as oy . We shall soon see that this gives the 
j=l 


same value detA. 


2. The properties of determinants relating to columns, As an application of 


Theorem 1 , we obtain a whole new series of properties of determinants. 
THEOREM 2. Properties D1-D7 of 81 hold for the columns as well as the rows 
of a determinant. 


Proof, It is clearfrom §1 that the properties D4-D7 are completely formal 


120 


consequences of D1-D3, and hence, in order to prove the analogous properties for the 
columns, it suffices to prove the first three of them for the columns, Notice that the 
“normalization” property D3 is of a special sort, and does not relate to the rows or columns, 
i.e., it is not affected if we replace rows by columns in the list of properties. Thus, we need 
only consider properties D1 and D2. 

We first want to show that for any j, ifthe entries of A notinthe j-th column 
are fixed, then det A isa linear function of the a » i= 1,---,n. We simply use 


formula (2). It shows directly that det A isa linear function of the elements of the j-th 


column, since the cofactor at does not depend on these elements, This givesus Dl. 


We now prove D2 -- the skew-symmetry of the function deta’) | 000 at) Te 
by inductionon n. If n=1, property D2 does not assert anything. If n=2, then 
D2 is easy to verify directly: 

a b b a 

: ; ad - be = P i : 
Suppose n> 2. Suppose we interchange the columns as) and oe » . We use 
formula (2) with j#k,k+1. Both of the columns nes ; po) occur in the minor 


Mi (or, equivalently, in the cofactor Ai) , but they are shortened: they occur without 
the elements ee Ga fea By the induction assumption, when the two columns are inter- 
2) 


changed each minor changes sign. Thus, we have 


det(- -- ae ee, == det(..- i see) 


bi 


: O 


3. The transpose determinant. We first recall a concept introduced in Exercise 1 
of §2 Ch. 2. The nxXm_ rectangular matrix whose i-th column, i= 1,2,--.,m 
coincides with the i-th rowofthe mxn matrix A_ is called the transpose of A. 


t 
We denote the transpose of A bythe symbol A orelse A’, Thus, if A = (ai;) and 


t ' ' 
A= (a;) , then a, = a, . For example, 


121 


eB wh = 
conn um 


A column may be considered as the transpose of a row: 


iE 
ous »x J = (Kya cers x) 


In the case of a square matrix, one sometimes refers to the determinant of the transpose 


hes eae 5a 

a a a 
hee t. = 12: 22 n2 

21h Fon i aon 


as the transpose determinant, The eperation of transposing a matrix or determinant can be 
represented visually as rotation around the main diagonal, which consists of the elements 
a 

THEOREM 3. The determinant of the transpose of a matrix is equal to the 
determinant of the original matrix: 


det oR = det A 


Proof. We consider the function oO; MCR) ~ IR. which is the composition 
AP ts > det "A of the transposing function with the determinant function. The function 9% 
has properties (i) and (ii) in Theorem 2 of §1. Namely, by Theorem 2 of this section, 
the function as > det “A has properties D1-D7 relative to the columns of a Gg lees, 
relative to the rows of A. Thus, isa multtlinear skew-symmetric function of the 
rows of the matrix. By Theorem 2 of §1, wehave (A) =A(E) » det A = det "Ee det A. 


put ‘E=E, sothat det'E=1. Hence, ®(A)=detA. O 


Note how useful Theorem 2 of §1 has been in providing short, non-computational 


proofs of Theorems 1 and 3 of this section. 


Theorem 3 tells us that the rows and columns of a determinant play equivalent roles: 


122 


the properties expressed in terms of the rows can also be expressed in terms of the columns, 
and vice-versa. For example, in addition to Theorem 1 on expanding a determinant along 


a column, we have 


THEOREM 1'. The determinant of a matrix A _ is equal to the sum of the products 


of each element of any fixed row by the corresponding cofactor: 
n 
det A = & ae 1G, 


We also have the following useful criterion for the vanishing of a determinant: if any 
row (any column) of det A is a linear combination of the other rows (resp. columns), then 


det A = 0 (see properties D1'; D1" and their analogs for columns). 
The following two examples illustrate the properties of determinants. 


Example 1. The so-called Vandermonde determinant 


1 1 1 
oa cae a 
2 2 2 
An = Ps X, x = A(x, Xo» ; x) ; 
n-l nel n-1| 
xy Xy oe 
is given by the formula 
A, = Il (x, 6 x,) : (3) 


l<i<j<n 


or, writing this expression out fully, we have 


j= (x. - x) (x, -X)) oe (x -X) (3 7X5) eecee (x 7x 


3 Ben OXON 


2) n doi! Y 
(in this connection, it is useful to look back at Exercise 4 of §1). In particular, if the 
elements x 


1°°''» %, 4re pair-wise distinct, the Vandermonde determinant is non-zero, 


This property is often useful. By Theorem 3, we also have 


exe ae af 
il 1 1 
ex: re a 
ee 2 2 2 
n 
il Be ic oy 
n n n 


We use induction on n to prove (3). Suppose that a is given by (3) for 
m<n. Using property D7 , we subtract xy times the (i-1)-th rowfrom the i-th 


row, foreach i: 


1 1 1 
0 XX) XX) 
2 2; 
= 10 2 
An x ay an xy ceil 

fe 6 @ 6 eo ee oe | 
[ee eee tn | 
2 2 Hs n ab oh 


The natural next step is to expand an along the first column, and take the common factor 


x, inthe j-th column (j=1,2,---,n-1) ofthe resulting (n-1)-th order 


Sil 1 


determinant outside the determinant sign (using property D1’ for columns). We arrive at 


the expression 


1 1 1 

x x x 

2 3 n - 
ee ee (x, -%)) hos = 

n-2 re ya? 

re 3 n 


= (x 7x) 00 y -x,) save (x, -X)) . A(X, y X35 see, x) ‘ 


which coincides with (3), since, by the induction assumption, 


MGA ae Nl (62, > 3s.) 9 
z n 2<1<j<n j i 


124 


Example 2. A matrix A = (ai) of the form 


Y Spee a in 
By roe oan 
ae ee a3 


is called skew-symmetric (its determinant is also called skew-symmetric). In other words, 
A =-A. By Theorem 3, we have 


det A = det A = det(-A) = (-l)" deta , 


so that [1 + (- ine Tees A= 0. Forodd n weobtain detA=0, ie., every skew- 


symmetric matrix of odd order has zero determinant. 


4, Determinants of special matrices. The more zeros among the elements of A 
and the "better" their location, the easier itis to compute det A. This intuitive idea in 
certain cases leads to an exact formula. For example, we know (see subsection 2 of §1) 
that the determinant of an (upper or lower) triangular matrix is equal to the product of the 


elements on the main diagonal. Another important special case is 


THEOREM 4. A determinant D of order n+m_ containing zeros in the inter- 


section of the first n columns andthe last m_ rows is given by the formula 


1) ee alien genes 
a a b b 
nl nn ene! “hn, n+m i in ee re 
0 0 4 : 
5 te 
a a b b 
. 5 é nl no ml mm 
0 0 b b 
ml mm 


(a determinant of the form on the left is sometimes called a quasi-triangular determinant or 


a determinant with zero corner). 
SS a ee ee OAC OLN 


Proof, First fix the n(n+m) elements oy and consider D asa function of 


125 


the elements by ,» which make up an m-th order square matrix B. We thus consider 


the determinant as a function of the matrix B: D = &B). 


Clearly, because the determinant D is multilinear and skew-symmetric relative 


to the last m rows, the function .(B) has the same properties relative to the rows of 
B. Hence, we may apply Theorem 2 of §1 to (B), and conclude that 


8(B) = O(E) » det B. By the definition of ©, we have: 


aie e Al 1 n+l ae: 71 n+m 
a a 

X(E) = nl no n,o+l n,o+m 
0 ae 0 1 0 
0 0 0 1 


We expand X(E) along the last row (see (2)), obtaining a minor which we then expand 
along its last row, i.e., the second-to-last row of this matrix, and soon. Repeating this 


operation m times, we seethat &(E) = det A, where 


We finally obtain: D = 8(B) = det A + detB. oO 
We can introduce a more compact notation for Theorem 4: 


det 


; c| = det A + detB , (4) 


Here A and B are square matrices, andthe 0 matrixand C are rectangular 
matrices. Combining Theorems 3 and 4 (or else repeating an argument completely 


analogous to the proof of Theorem 4), we easily see that, similarly, 


A 


=! (| ° det B 
CR et A 


det 


CA 


B ol , but note that the simplest 


One might try to write a similar formula for det I 


126 


Get 
10 


correct formula, we might first permute the rows or columns, in order to transform the 


possible example = - 1 shows that there is a problem with the sign. To obtain a 


0 
matrix IS | to the form Ie all or Is =e 


A simpler approach is based on Theorem 2 of §1, which we have already used 


several times. Namely, using that theorem as in the proof of Theorem 4 above, we obtain 


Next, by formula (1) applied m_ times, we find 


: ayo a, 
: anes fon 
Cc OA 
det E ‘| 1 0 @ | 0 = 
0 i @ 4 0 


(n+2)+(n+4)+--++(n+2m) 


204) det A = (-1) det A 


We conclude that, if A and B are square matrices oforder n and m, 


respectively, then 


G ss 


See ey 


| = (+1) "detA + detB. (5) 


Formulas (4) and (5) are both special cases of a general theorem of Laplace on 
expanding determinants. But this theorem is not often used, and we shall not discuss it. We 
are also in no hurry to derive the so-called complete determinant expansion theorem (see §3 
of Chapter 4), since that complete expansion is of little use from a computational point of 
view. 


A very important property of determinants is 


THEOREM 5. If A and B are nxn matrices, then 


det AB = det A + det B 


127 


Proof. According to formulas (7) and (9) of §3 Ch. 2, which express the entries 


i in the matrix (ci) = AB= (a,b; )) in terms of the entries in the matrices A and 


B, the i-th row (AB), is given by 


7 (1) (2) (n), - 
ye ee SB Ga B= a Bi Pig 


Fix a matrix B, andfor any matrix <A_ set 
(A) = det AB. 


We show that the function &® satisfies the conditions (i) , (ii) of Theorem 2 §1. In fact, 


we know that det AB is a linear function of the elements of the i-th row (AB); g 


det AB = oe - ee oa eye 
Hence, 
n n n n n 
oe a ‘i 25 Bi Pig = 2, “ik a 4 Pj > 2 Me nik 
n 
where w= De Xj a5 is a scalar which does not depend on the elements of the i-th 


row A. of A. Wethus see that (A) depends linearly on the elements of the i-th 
i 


row Ay of A. 


Now suppose we interchange As and Avs Since the s-th and t-th rows of 


AB. have the form 


(1) (n) 
Aaa a ee ) 


? 


(n) 


(1) 
(A.B preg ee B ier 


it follows that they are also interchanged when we interchange Ag and A. . Thus, by 


Theorem 1, 


-, A.,+--) = 9(A) = det AB 


iT} 
iT 


det [---, QE) ea (AB), +--+ J 


-det[---, (AB), +++ ; (28) = 


Thus, both conditions of Theorem 2 81 are fulfilled, andhence (A) =(E) « det A. 


But, by definition, (E) = det EB = det B. This gives us the desired formula. im 


Bs Building up_a theory of determinants. Theorems 1 and 2 of §1 


essentially give us an axiomatic description of the det function, even though 
we started out by defining det by an explicit construction. 

We shall now give aneenee method which can be used to construct a theory 
of determinants. Namely, suppose we have a function D: M (R)-—>R with 
the following properties: 


(i) D(AB) = D(A)+D(B) for any matrices A, Be MCR); 


(Gab) OES re =-l for every elementary matrix FS é (see subsection 4 of 84 
> > 
Gris 2))8 
GH VA) = for any upper triangular matrix of the form 
A 
* 
A = 2 5 DALES 
0 : 
1 
We claim that D= det. To prove this, we first apply properties (i) 


and (ii) to the elementary matrix 


aL 0 r 0 


Oe) = *, = F . se oF 
0 1 e 
We find that DF A)) = (-1)*A+(-1) =A, AO. According to (iii), 


Oe raCoD) = 1 for an elementary matrix Fe C9) with s<t. Since 
> 


’ 


F Ee OOUCRE = Dee 


it follows that KON 3) is also 1, and so Agee 4A) = 1 for any 


indices s #t. Furthermore, 


129 


E. 0 
ne COSI COD). 
0 0 oor n 
and hence 
E 0 0 fiesta son 
i) 2 = 
0 0 al if r=n 


Thus, D(F = =9 = det F oe Ue) =eile=edet as and 


ul 
>~ 
HT 


D(FQ)) 


det FQ). Since any matrix Ae MCR) can be written in the 
E, 0 


form Be Py 0 


Of ten, where P and Q are products of elementary 


matrices (see the argument before Theorem 5 of 84 Ch. 2), property (i) enables 


us to conclude that D(A) = det A. 


It would be a good idea for the reader to try to suggest and justify 


his own version of an axiomatic description of the det function. 


EXERCISES 


1. Theintegers 1798 = 31+58, 2139 = 31°69, 3255 = 31-105 , 4867 = 31+ 157 
are divisible by 31. Without any computations, prove that the determinant of the following 


fourth order determinant is also divisible by 31: 


PB wWNH 
oo Nh = sy) 


2. Show that every fourth order skew-symmetric determinant la, with aa eZ 


is the square of an integer. (Note. This is true for a skew-symmetric determinant of any 


order. ) 


3. Prove that det AB = det A + detB (Theorem 5) by performing elementary 


ee : E B ; 
transformations of type (II) on the rows of the auxiliary matrix C= | a ol of size 


lo pl: 


2n x 2n_ in such a way as to reduce it to the form ¢ = 


130 


4. Show that CARNES (Bs forany mxXr matrix A and rxn _ matrix B. 


5. Show that det BAG = det A forany Ae M,(R) and any invertible 


Be M(R). 
6. Let 
i 0 0 0 
ns 
- é 0 0 0 
I ho 
CO peoo ng jh i 
I 5 1 0 
a n 0 O 0 ne? 
0 O -1 1} 
p hn-1 | 
O © ses - | 
0 0 if N 
Show that det Cc. = a det Coy + det oe = jhe Ay = hy Seve Me = 1, compute the value 


of det Cc. 
n 
7. Show that the determinant of the nxn matrix 


2 ae ©) eae eg 0 
Sh oil) Bas 0 oO O 


S 
=) 


hoe OQ sl 2 =f os, 0 O O 
a 2 6 . o 6 

oO @ OO cil ail 

Oo @ © © , @ ci @ 


is equalto o¢+l. 


$3. Applications of determinants 


1, Criterion for a matrix to be non -singular, In §3 Ch. 2, we said that a square 


matrix A is called non-singular if ithas an inverse A : . Ifwe apply Theorem 5 of 


; Sh Al =] 
§2 tothe relation AA =A “A = E, weseethat detA * detA =1, Thus, the 


determinant of a non-singular matrix is non-zero, and 


det at = (det te 3 


Given a matrix A, we may consider its classical adjoint matrix 
cee ee een ey 


131 


a ic aac 
A = ; a 
Aun a Aun 


: Yo. : : 
The matrix A is obtained by replacing each entry Tl in A_ by the corresponding 
i 
cofactor ea and then taking the transpose matrix. 


THEOREM 1. A matrix Ae M,(R) is non-singular (invertible) if and only if det 


-l ‘aa 
A #0. If deta gO, then A) = (deta) AY, iver, 


2 s -1 iil il 

nt ln det A det A 
A 

and ann ai ale 


To prove this theorem we first need a lemma. 


LEMMA, Let Ae M(R). Then; 


ee yi + aio Aj tees t+ Be a 6 det-A (1) 


5, det A , (2) 


Pla Nag ae Blan ae ape ap GL AN 
2j ni nj 


i dy wat 


where ij is the Kronecker symbol (if i #j, this can be thought of as expanding the 


determinant using the cofactors of the wrong row or the wrong column, respectively). 


Proof, If i=j, the lemma coincides with Theorems 1 and l' of §2. So 


suppose that i#j, in which case i =0. To prove the lemma in this case, we introduce 


the matrix 
eee in 
a a. ‘ a 
all i2 in 
a Ais eel an , a ’ 
“i oom ae 
Ps: 2 a Py | 
vito nae nn 


132 


which is obtained from A =[---, Ay BAO 5 a »°:: J] by replacing the j-th row by the 
i-th (keeping the i-th row in place). As always with a square matrix with two identical 
rows, we have det A’ = 0. On the other hand, the cofactor a (k= Ioooo, m) is 
obtained by crossing out the j-th row = = A, andthe k-th column of the determinant; 
hence, ae = a . If we expand the determinant of A' = (a5 along the j-th row, 
we obtain the relation 

n n 


O = det A' = 2, ae Dep = os a ve ’ 


which is precisely equation (1) in the lemma. The second equality in the lemma is obtained 


by the same argument, applied’to the columns, (o) 


Proceeding to the proof of the theorem, we simple note that the left side of (1) is 


nothing other than the entry oi in the matrix C = AAY : 


tll nn ade nn In *"* nn 
Using (1), we have Cc) = oF det A) = (det A)E. Thus, 


AAY = (det A)E 


, 


so that when det A # 0 we obtain 


Gee Ve Geer 2 


The left side of (2) isthe entry 1, in the matrix C' = AY A. Since the right 


sides of (1) and (2) are the same, we arrive at the following equalities when det A # 0: 


A@er A) AY = (deta) A’ a = & 


? 


and so we have a = (det Ay is 5 im} 


id) 


COROLLARY 1. The determinant vanishes if and only if the rows (or columns) are 
linearly dependent, 


133 


Proof. This criterion, part of which has already been given (see subsection 3 of 
$2), could have been proved much earlier, but we had no need of it. To prove the criterion, 
we know by Theorem 1 that det A = 0 ifandonlyif <A is singular, and, by Theorem 4 
of §3 Ch, 2, this is equivalent to the condition rank A <n (where A isan nxn 
matrix). Finally, by Theorem 1 of §2 Ch. 2, this condition precisely characterizes those 


nXn matrices with linearly dependent rows (or columns). (ral 


Theorem 1 is of greater theoretical than practical value. From a computational point 
of view, especially for large matrices, it is usually more convenient to use (P,Q)-reduction 
(see the corollary to Theorem 5 of $4 Ch. 2) to find hee 

We now derive formulas for solving a system of n linear equations with n 
unknowns, which, after all, was one-of our main reasons for developing the theory of 


determinants in the first place. 


COROLLARY 2. (Cramer's rule). If a linear system 


has non-zero determinant (i.e. , ee) # 0), then its unigue solution is given by the 


formulas 


(the numerator dD, is obtained by replacing the k-th column in D = det (a) by the 


column of free terms). 


Proof. By Theorem 1, the matrix A= (a,) is invertible. Hence, if we write our 


134 


system in the form AX = 8B, we obtain, asin §4 of Chapter 2, 


ne 
SS eee 


and hence 

tes 
: kj 

Smee 

k det A 


eS 


It is this expression in the numerator which we obtain by expanding the determinant D,. 
along the k-th column (see (2)). Thus, any solution x = (x), cee, x) must be given 
by the formula in the corollary. 

If we go through all of these steps backwards, we see that (D,/ det A,---, D_ / det A) 


actually is the solution of our system. Oo 


Note that the formulas (3) and (9) of §4 Ch. 1 coincide with Cramer's rule for 
n=2 and 3. Although convenient for small n, forlarge mn Cramer's rule has a 
largely theoretical value. For example, if we apply it to the linear system in example 2 of 
subsection 4 of §3 Ch. 1, we obtain the following expression for the n-th Fibonacci 


number (using the fact that det A = 1): 


ie Olam 0) eae Oo @ wt 

eo Oo @ @ i 
pe Sit eat al @ @ il 
n : . . aoe 


This is clearly a long way off from the nice, explicit expression for i which we found at 


the end of 83 Ch. 2. 


2. Computing the rank of a matrix. In §§2 and 4 of Ch. 2, we found how to give 


a complete description of the set of solutions of a general rectangular system of linear 
equations. The notion of the rank of a matrix played an important role in this description. If 
we translate this notion into the language of determinants, we shall have at our disposal both 


another method for computing the rank and a convenient way of expressing linear independence 


135 


of a set of vectors in the vector space R™ 


Thus, let 


a tea cl se & 
Il lr In 
A = |jla a 6 fl 
rl rr rm 
a a a 
ml mr mn 


be any mxXn_ rectangular matrix with entries a eR. Bya k-th order minor of A 
we mean the determinant of any matrix obtained by taking the intersection of k rows and 
k columns, where k < min(m,n). 

Suppose that A has rankr. By Theorem 1 of §2 Ch. 2, this means that 
r isthe maximal number of linearly independent rows in A, and also the maximal 
number of linearly independent columns in A. lf we nowuse Theorem 5 of §4 Ch. 2 


and its corollary, we can write 


where B and C arenon-singular mXm and nXn _ matrices, respectively, which 


Eae0 
: ; : r 
are written as a product of elementary matrices, Since the matrix has the 
0 0 
non-zero r-th order minor M = |E | = 1, but does not have any non-zero minor of 
ie 


order >r, and since this property is preserved when elementary transformations are 
performed on the rows and columns, we have proved the following 

THEOREM 2. The rankofan mxXn matrix A_ is equal to the maximal order of 
a non-zero minor. mM 


Any non-zero minor of maximal order in A is called a basis minor, The columns 


(or rows) of A which intersect a given basis minor are called basis columns (respectively, 


136 


basis rows), in agreement with the terminology in Chapter 2. As before, if we interpret 
the rows and columns of an mxn matrix A asvectorsin IR” and IR™ » respectively, 
and if we use the basic properties of a linearly independent set of vectors (the fact that it can 
be completed to a basis; see Exercise 5 of §1 Ch. 2), we easily see that the search for a 
basis minor can be much simplified if we successively look for higher order minors 
containing non-zero lower order ones. Namely, if we have any non-zero k-th order 
minor M in A, then the next step consists only in checking those (k+1)-th order 
minors which contain M, i.e., from which M_ is obtained by crossing out a row and 
column, If allofthese (k+1)-th order minors vanish, then rank A = k. (Why? By 
Theorem 2, this would mean that every column of A_ can be expressed as a linear 
combination of the k columns which intersect M.) If not all of these minors vanish, 
then take any which is non-zero and then proceed to check all (k + 2)-th order minors 
containing it. 

This method for determining the rank is rather practical, especially when we want to 
know not only the rank, but also a maximal linearly independent set of rows or columns of 
A. Of course, we have to be careful to remember that this information is lost if we perform 


elementary transformations on the matrix, 


EXERCISES 


1, Show that the following relations hold: 
OSS TS) Pry ee 
(YY = (det ayy za 
2. Express rank he intermsof rankA. 


3. Prove that a square system of homogeneous linear equations has non-trivial 


solutions if and only if the determinant of the system is zero. 


4. Using the results of subsection 1 of §4 Ch. 2 and Corollary 2 of Theorem a 


137 


show that a homogeneous system 


a X, +e. a = 
(pledhe Iba ss n-1,n°n 
of rank r=n-l has as fundamental set of solutions the single column-vector 


& = (D5 o 10 


fatczal 
12 9? Dgrtte, (“1 DI 3 


where D. is the determinant of the matrix obtained from A = (a) by crossing out the 
i-th column. Thus, every solution has the form X = es 


5. Suppose that A = (a; € M_(R) and (a-1Ja,,| < la, | forall i #4. 
Prove that detA #0. 


6. Prove the following fact. Let A = (a, and B= (b, ,) be matrices of size 


nxXm and mxXn, respectively, andlet C = AB. Then 


a Elon  ¢ al | bas b, b. 
1j, 23, nj, {4 4,2 ja 
Gi Pek. da || iim | ioe 3g 19, 
li, 2h Nj, iy} j,2 jn 
det C = Ss oes c a: é ee P 
Ieuan pS tosh 
1 n a a bus a b b 5 b, 
: ? A 
ede met) dnt dy? iy 


m ees 
The summation on the right is over all (2 combinations of n numbers 


Pee) det from, 2.2.5, m, In particular, “det © =— der A der B when 


m=n, and detC = 0 when no>m. 


7. Using the preceding exercise, show that, if A isan mxXn matrix, m= 


then 


where M_ runs through all cs n-th order minorsof A. 


n 


bs 


Chapter 4. Algebraic Structures— 
Groups, Rings, Fields 


The preceding chapters have provided us with a great deal of concrete material, 
which it is now time to consider from a more general point of view. For this purpose, we 
introduce and study (still on an elementary level) the concepts of groups, rings, and fields, 


which play a fundamental role in all of algebra. 


$1. Sets with algebraic operations 


1, Binary operations. Let X be any set. An algebraic binary operation (or 
composition law)on X is any fixed map 7:X x X ~ X ofthe cartesian square 
x = Xx X to X. Thus, to every ordered pair (a,b) ofelements a, be X there 
corresponds a unique element T(a,b) of the same set X. We sometimes write atb 
instead of t(a,b), and in fact usually we introduce a special symbol *,0, +, + CtC ECO 
designate a binary operation on X. In our examples we shall follow this practice, and shall 
call a+b (or simply ab with no symbol between a and b) the product and a+b 


the sum of the elements a, be X. In most cases one of these conventions will be 


convenient. 


139 


It is possible for the same set X _ to have more than one binary operation defined on 
it. When we choose one such operation, and want to think of X in conjunction with that 
particular binary operation *, wewrite (X,*) andsaythat * determines an algebraic 
structure on X orthat (X,*) is an algebraic system. For example, in addition to the 
usual operations + (addition) and + (multiplication) onthe set Z of integers, it is 
easy to give new operations made up from + (or -) and *: nom =n+m-am, 

n*m = -n-m, etc. We thereby obtain different algebraic structures (Z,+), (Z,°), 
(Z,0), (Z, *). 

Clearly, the imagination can find a boundless expanse in constructing all sorts of 
binary operations on a set X. But the problem of studying arbitrary algebraic structures 
is too general to lead to conclusions that have any concrete value. For this reason the 


problem is studied under certain restrictions to various special types of algebraic structures. 


2. Semigroups and monoids. A binary operation * onaset X_ is called 
associative if (axb)*c = ax*(b*c) forall a,b,ce xX; it is called commutative if 
axb =b*xa forall a,beX. We use the same terms to apply to the algebraic structure 
(X, *). The conditions of associativity and commutativity are independent, i.e., neither 
implies the other. For example, the operation * on Z givenby n*m=-n-m is 
obviously commutative, but it is not associative, since, for example, (1*2)*3 = (-1-2)*3 
= -(-1-2)-3 = 0, while 1*(2*3) = 4. Onthe other hand, the set MOR) of all 
n Xn square matrices is associative but not commutative under multiplication (be igi 2 1D), 
as shown in subsection 2 of §3 Ch. 2. 

An element ee X_ is called a unit element (or a neutral element) relative to a 
given binary operation * if wehave: ex*x =x*e =x forall xeX. If e is 
another unit element, then it follows immediately from the definition that e'=e'*e=e, 
Thus, an algebraic structure (X,*) can have at most one unit element. 


A set X_ with an associative binary operation is called a semigroup. A semigroup 


having a unit element is called a monoid (or simply a semigroup with unit), 


140 


As with any set, the cardinality of a monoid M = (M, *) is denoted Card M_ or 
|m| . lf it has finitely many elements, we call M_ a finite monoid of order IM | 3 


We now give some examples of semigroups and monoids. 


1) Let Q be any set, andlet M(Q) be the set of all transformations of Q 
(mapsfrom QQ to itself). It follows from the properties of sets and maps in §5 Ch. 1 
that M(Q) isa monoid. Of course, we have in mind as our binary operation the 
composition of maps o. M(Q) has the unit element £9 which is the identity map. 

Consider the special case when Q is a finite set of n = la| elements, which we 
simply denote by the integers 1,2,---,n. Every map f:9Q-Q is determined by 
giving an ordered sequence f(1), f(2),---, f(n), where each f(i) is an element of 9. 
We allow the possibility that f(i) = f(j) for i #j. There are precisely n° possible 
sequences, i.e., n” transformations. Thus, IM(a)| = Card M(Q) = a” . For example, 
take n= 2. The fourelements e, f, g,h ofthe monoid M({1,2}) and their 


products (compositions) are completely given by the tables 


It is clear from the table to the right that M({1, 2}) is a non-commutative monoid. 


2) Againlet Q be an arbitrary set, andlet (Q) be the set of all subsets (see 
Exercise 4 of §5 Ch. 1). Since (ANB)NC = AN(BNC) and (AUB)UC = AU(BUC) ; 
it follows that there are two natural associative binary operations defined on P(Q). 
Obviously, @UA =A and ANQ =A. So we have two commutative monoids 
(P(Q),U,@) and (P(Q),1,Q), where we denote a monoid as a triple (set, binary 


operation, unit element), We know that le(a)| = De she la| — ae 


141 


3) (MGR) ,+,0) is a commutative monoid whose neutral element is the zero 
matrix, and (MCR), *, E) is a non-commutative monoid whose neutral element is the 
identity matrix. This follows immediately from the properties of matrix addition and 


multiplication which we encountered in Chapter 2. 


4) Let n#% ={mn |m € Z} be the set of integers divisible by n. It is clear that 
(n%,+,0) isa commutative monoid, and (n%Z,+) is a commutative semigroup without 


aie (ibe fol >) 


5) The set P A(R) of stochastic matrices of order n (see Exercise 7 of §3 


Ch, 2) is a monoid under the usual matrix multiplication. 


A subset S' of asemigroup S with binary operation * is called a subsemigroup 
if x*yeéeS' whenever x,yeS'. In this case we say that the subset S'CS_ is closed 
under *. If (M,*) isa monoid, andthe subset M’C M is not only closed under *, 
but also contains the unit element, then we say that M’ isasubmonoidof M. For 
example, (nZ,°*) isa subsemigroupof (Z,+*), and (n%,+,0) isa submonoid of 
(Z,+,0). Any submonoid of the monoid M(Q) of maps from a set to itself is called a 


monoid of transformations (of the set QQ). 


3. Generalized associativity; powers. Let (X,°*) be any algebraic structure, 


For simplicity, we shall omit the +* andwrite xy insteadof x+y. Let Xprtrty X 
be an ordered sequence of elementsin X. Without changing the order, there are many 


different ways we can form the product of n elements. Let as be the number of ways: 
- = 1; X) Xo} 


“4. = 2: (XK) X5)%Xq ; XK, X_) 3 
i = 3 5 (x) Xy)Xq)X4 ’ (x (XX) X, ’ x(x, X,)X4) ) X1(Ko(Xq x,)) ; 
(x, x5) (x, x,) ; and so on, 


It is clear that we can obtain all 4, possibilities if, foreach k, 1<k<n-l, 


142 


x, and all possible products of 


we run through all possible products of x pores Ky 


and then take the product of these two products. It is of great importance 


aD 


eye 9 


that the location of the parentheses does not matterif (X,*) isa semigroup. 


THEOREM 1. [Ifthe binary operationon X is associative, then the result of 
applying it successively to n elements of X does not depend on the location of the 


pa rentheses, 


Proof. If n=1 or 2, there is nothing to prove. If n= 3, the theorem 
coincides with the associative law. We now proceed by induction on n. Suppose that 


n> 3 and that the theorem holds when the number of elements is <n. Weneed only show 


» 


that 


(X) ++. x,) Ope x) = (X) ++. x,) 0 (G8 


eet? Xp) (1) 


forany k and £4, 1<k,4<n-1. Wehave only written out the outside parentheses, 
since, by the induction assumption, the location of the inner paretheses does not matter. 


In particular, we can set XpXpoee X= (Oe (x, X5)X3). se XX » Which is called the 


left-normalized product. We distinguish between two cases: 


ie i Sil, Pele = (cao ae i S i 
a) n Then (x, XD Sn ( (x, x x x is a left-normalized 


mee n-1 


product. 


b) k<n-1. _ By associativity, we have 


(x xX )x_) 


aes XO) Oy eee x) = (Xj ++. (Gy pee nee oe 


1 


= (x, ees x) (uy ae Xp) *a = 


= (GG) ane eee 


i.e. , we again obtain a left-normalized product. The right side of the desired equality (1) 


can be reduced to the same form, oO 


143 


In §2 Ch. 2 we introduced the summation sign 2x,. It can be used in any additive 
i 
commutative monoid. The analogous symbol in a multiplicative monoid is the product sign: 


2 3 n n-l 
Il 3k, SS eee ml 26 = Gh see. M1 x, =| TN x,Jx : 
i=l i=l ae Ve esi 


By Theorem 1, the parentheses are not needed when writing (or computing) a product 
x) X, Sb es of elements in a monoid. The only care that must be exercised is in the order 
of the elements, and even that is unnecessary if the elements commute with one another. In 
particular, if ee 4g then, as with ordinary numbers, we let x" 
denote the product xx... x, which wecallthe n-th power of the element x. Asa 


consequence of Theorem 1, we have the relations 


mon m myn mn 
eo x we, Gy SE x » MneN . (2) 


0 
If (M,+*,e) isa monoid, we further set x =e forevery xeM. 
In an additive monoid, i.e. , in a commutative monoid in which the operation is denoted 
o : : : 
by +, the "power" x is written nx =x+x+-+---+x andiscalleda multiple of x. 


Then (2) becomes the following rules for multiples: 
mx +nx = (m+n)x , no(mx) = (nm)x ; (ee) 
We note another useful fact. If xy = yx inamonoid M, then 
(xy)” = xy" me ON, wo ose s (3) 


In particular, this is always the case in a commutative monoid. (3) is proved by induction 


coleman Os 
(xy) = (xy) boxy) = Ot yh Gy) = Ty Ty = 
a Goo ae ed ce et s) = os af 
More generally, if we use (3) and induction on m, we obtain 
eS, S Se, » CA St Cis ee) aan - (4) 


i jt 


144 


The analogous rules for multiples are: 
n(x+y) = mx +ny, m= OW, N,2 0500 5 (3°) 


ees Koa ae oe, m = OW, i, 2,900 2 (4') 


Normally, a monoid which is written (M,*,e) is called a multiplicative monoid, and one 
which is written (M,+,0) is called an additive monoid. The additive notation is usually 


only used for commutative monoids. 


4, Invertible elements. Anelement a ina monoid (M,*,e) is called 
invertible if there exists an element be M_ suchthat ab =e = ba. (Clearly, in that 


case b is also invertible.) If we also had ab' = e = b'a, then it would follow that 


b' = eb’ = (ba)b’ = b(ab') = be = b. Thus, we can speak of the inverse element a 


-1 -1 
when a isinvertible: a a=e=aa . 


; “171 : ‘ 
Itisclearthat (a )}) =a. The notion of an invertible element in a monoid is a 


natural generalization of the notion of an invertible matrix in the multiplicative monoid 


(MGR), +, E). 


tl 
Fa 
~~ 
<— 
< 
‘ 
~ 
— 
Fad 
‘ 
(ey 
" 
rd 
fa) 
* 
‘ 
tl 
It 
o 


elk il 
Since (xy)(y x ) , and similarly 


-1_-1 “lo -l +l 
(y x )(xy) =e, wehave: (xy) =y x. Hence, the set of all invertible elements 


ina monoid (M,:°,e) isclosed under + and isa submonoidof M. 


EXERCISES 


1, Subsection 2 contains the example of the commutative but non-associative 
operation *:n*m=-n-m on Z. Thealgebraic system (Z,*) has the following 
identities: (n*m)*m =n, m+*(m*n) =n. Now suppose that we are given an arbitrary 
algebraic system (X,¥*) inwhich (x*y)*y =x and y+*(y*x)=x forall 


x,y«X. Provethat x*y=y*x, i.e., * is commutative. 


145 


2. Show that 


is a Semigroup under the usual operation of matrix multiplication. Is ama), *) a 
n 


monoid? 


3. Ina multiplicative monoid M we choose an arbitrary element t and intro- 
duce the new operation *:x»*y = xty. Showthat (M, x) isa semigroup, and that 


(M, *) isa monoid if and only if t is invertible, in which case the neutral element is t 


4, Show that the set Z withthe operation o:nom=n+m+nm = 
(1+n)(1+m)- 1 is acommutative monoid. What is the neutral element in (Z 30)? 


Find all invertible elements in (Z,0o). 


§2. Groups 


1. Definition and examples. Consider the set GL(n, IR) ofall square nxn 
matrices with non-zero determinant. By TheoremS of $2 Ch. 3, if det A #0 and 
detB #0, then det AB #0. Thus, A,BeGL(n, R) => ABe GL(n, R). In addition, 
(AB)C = A(BC), and there is a special matrix E suchthat AE = EA = A forall 
Ae GL(n, IR). Finally, every matrix AeGL(n, IR) has an "opposite", i.e., an 
inverse A! suchthat AA l= A A= E. 

The set GL(n, IR) considered with the composition law (binary operation) 

(A,B) ® AB is called the general linear group of order n over RR. Following the 
terminology in $1, we could define GL(n, IR) as simply the submonoid of all invertible 
But this submonoid is extremely important in its 


elements in the monoid (M,(R), on Ee 


own right, and is a key example of the following abstract definition. 
Definition. A monoid G_ all of whose elements are invertible is called a group. 


In other words, the following axioms must hold: 


146 


(Gl) a binary operation (x,y) & xy is defined onthe set G; 
(G2) this operation is associative: (xy)z = x(yz) forall x,y,z ¢« G; 
(G3) G_ has a neutral (unit) element e:xe = ex =x forall x eG; 


all call = 
(G4) every element x « G has aninverse x :xx =x x=e 


Surprisingly, one of the oldest and richest areas of algebra, playing a fundamental 
role in geometry and in applications of mathematics to the natural sciences, is based on such 
a simple set of axioms. 

A group whose binary operation is commutative is called a commutative group or else 
an abelian group (in honor of the Norwegian mathematician Abel). The term "group" itself 
was introduced by the French mathematician Galois, the founder of group theory. The ideas 
of group theory were "in the air" (as often happens with fundamental mathematical ideas) 
long before Galois; some of the theorems of group theory were actually proved, although in 
a more naive form, by Lagrange. ‘The brilliant work of Galois was at first poorly under- 
stood, and its importance became fully recognized only after the appearance of Jordan's book 
"A course in the theory of permutations and algebraic equations” (1870). It was only toward 
the end of the nineteenth century that group theory "completely left the realm of fantasy. 
Instead, a logical skeleton was carefully prepared". (F. Klein, “Lectures on the development 


of mathematics in the nineteenth century"). 


The symbols CardG, fe] , and (G:e) are all used to denote the number of 
elements in a group (its cardinality). All of the facts about monoids in §1 apply, of course, 
to groups. However, some new words are introduced. A subset HCG _ is calleda 
subgroup of G if e¢ H; h,,h, ¢H = hyh,e H; and he H = hl ee A 
subgroup HCG _ is called proper if H #G. 

We now give some examples of groups. 


1) Inthe general linear group GL{n, IR), consider the subset SI(n, IR) of 


matrices with determinant 1: 


147 


Siam, hy = Ave Glin, IR) | det & = 1) 


Obviously, E « SL(n, R). By the results in Ch. 3 about determinants, det A = 1, 
HepB =a Sacerwge= I tand der A = (det A) =). Thus, SL(n, IR) isa subgroup 
of GL(n, IR); it is called the special linear group of order n over R. It is also called 
the unimodular group, although this name is sometimes used for the group of matrices with 
determinant +1. 

The group GL(n, IR), which contains many interesting groups, has been for 


mathematicians of several generations a seemingly inexhaustible source of new ideas and 


unsolved problems. 


2) If we replace the real numbers by the rational numbers, we obtain the general 


linear group GL(n, @) of order -n over Q andthe subgroup SL(n, Q). The group 


SL(n, Q) contains the interesting subgroup SL(n, Z) of matrices coease) 

with integer entries and determinant 1. Theorem 1 OlmS SaChreo GLa, 0) La, R) 

which gives an explicit formula for the entries in an inverse matrix, Oe 
ny 

shows that SL(n, Z) actually isa group. The groups SlL(n, Q) SL(n,Z) 

and SL(n, Z) occupy an important place in number theory. . 

Figure 11 depicts the partially order set (see subsection 3 of Walgeey. ALIL 


86 Ch. 1) of these subgroups of GL(n, R). 


3) Ifwe set n= 1 inexamples 1) and 2), we obtain, first of all, the multi- 
plicative groups IR* = R\{0} = GL(1, R) and Q* = @\{0} = GL(1, Q) of real and 
rational numbers, respectively. These are obviously infinite groups. Since the only 
invertible elements in (Z,+*,1) are 1 and -1, wehave GL(1, Z) = fas ee 
Furthermore, SL(1, IR) = SL(1,Q) = SL(1, Z)=1. Butif n= 2, even the group 


SL(2, Z) is infinite: it contains, for example, all of the matrices 


1 om 1 O joo tony = Al 
e Ge eG ple wee 


We further note the infinite additive groups: 


148 
(R, +, 0) , (On) bd (Z,+, 0) 


4) Let Q be any set, and let S(Q) be the set of all bijective (one-to-one) 
transformations £:Q-Q. Using the results of §5 Ch. 1 on set maps (Theorems 1 
and 2 and the corollary to Theorem 2), we immediately conclude that S(Q) isa group 
under the natural binary operation of composing maps. S(Q) is the submonoid of all 
invertible elements in the monoid M(Q) in example 1) of §1. The group S(Q) and its 
various subgroups are called transformation groups. Such groups are the basic type of 
groups that arise in applications of group theory. In 1872, F,. Klein announced his 
"Erlangen program", which sought to classify different geometries using the notion of 
transformation groups. If we take Q = R" , We obtain a very large group s(IR") , which 
is difficult to imagine in its entirety. But s(IR") contains the subgroup of invertible 


(bijective) linear transformations Py? Re Ske 


, Which we found to be in one-to-one 
correspondence with the nxn non-singular matrices A (see $3 Ch. 2). Thus, we 


have an imbedding of GL(n, IRR) in scr”) . The significance of this imbedding will become 


clearer after we introduce the important concept of an isomorphism of groups. 


2. Systems of generators. Givena subset S ofagroup G , Wwe try to finda 
subgroup HCG containing S such that every subgroup Kc G containing S must 
also contain H. There cannot be two distinct subgroups H , H' which play the role of a 


minimal subgroup containing S:; 
SE lel pee is = SS Ja" ce | Ss oS 1S 


Thus, the minimal subgroup H_ containing §S must coincide with the intersection 
of all subgroups containing S, if we show that this intersection must be a subgroup in G. 


But we have the following simple result. 


THEOREM 1, The intersection  H, of any family Heh Fen: of subgroups 
iel 


ofagroup G isa subgroup. 


149 


Proof. Let e be the unit element of G. The properties ee) H,, 
ae i 
-1 
SY € i H, => xy € f) His xe |i) He =) de en Hy , Which characterize a subgroup, must 


hold in Hy , because they hold in each subgroup H, separately. 9 
i BI 


Now take for {H, | ie I} the family of all subgroups containing the given subset 


Sc G. Then, by Theorem 1 and the remarks before the theorem, the intersection 


(Sy = Mm isl 
SSH 


is precisely the minimal subgroup containing S$. Wecall (S)_ the subgroup generated 
by S in G, andwecall S a set of generators for the subgroup (S). At first 
glance, it seems that (S) _ is defined ineffectively, since we have to find all subgroups 
containing S. But there is an easier way to determine (S), as we see from the 


following corollary of Theorem 1. 


COROLLARY. The subgroup (S) coincides with the set T consisting of the 


unit element e and all possible products 


where either tes oe © €Ss, Ici = we 


' tt 


i eats O00 Le meget = 
Proof. Since t i @ WU ye Cao a> ty nove 


call =I = : 
= ARO lee ag andi t yet Do (ee) = i oes eoleelt 


follows that the set T isa subgroup of G. On the other hand, every subgroup H 


‘ 1 : 
containing all x, « S must contain all of the inverses Xs and, hence, must contain 
i 


all products of the form t,t,--- ts Hence, H2>T, and T_ coincides with the 


12 
intersection of all such subgroups. Oo 
It should be noted that by no means are all of the products i distinct 


elements of (S), even if one agrees (as one usually does) to cancel any pair of successive 
t, of the form aA or aA . In general, when Is| > 1, the question of when two 
i 


products of the form ty ty .+. tare equal is a difficult one, and we shall only briefly 
n 


150 


discuss it in Chapter 7. 

Every group G _ has some set of generators S: for example, we cantake S_ to 
be the whole group G. For simplicity, we consider a group G _ which is generated by a 
finite set of elements (such groups are said to be finitely generated). If we remove from §S 
all "extra" elements, i.e. , those which can be written as products of the other elements 
(and their inverses), we obtain a minimal set M of generators of G. To say that M 
is minimal means that (M) =G, but (M') #G if M' is obtained from M_ by 
removing an element. Let M = {815 “a4, Bh . Then we also write 


G = (8),899°++) &) aswellas G=(M). If d=1, wecall G a cyclic group. 


3. Cyclic groups. If*G is any groupand g is anelement in G , then, by 
definition, (g) is acyclic subgroup of G. 
Because of Theorem 1 and the properties of the powers of an element in a monoid, 
we might expect that any cyclic group (a) with generator a is an abelian group of the 
n ? eae : 
form (a) ={a [ne Z} , or (a) ={na|ne Z} ifthe group operation is written 
additively (this notation is not meant to imply that all of the elements a" or ha are 
ae -1*k  -k 
distinct), This is in fact the case, once we agree to denote (a ) =a and prove the 


following fact. 


THEOREM 2. Forany m,ne Z, 


(or, in additive notation, ma + na = (m+n)a » n(ma) = (nm)a). 


Proof. If m and n are non-negative, see relations (2) and (2') of subsection 
491, Hf om< 0 and n= 0, then Se Ue ne Oeil 


m as) 1a - m'+n' ' t 
ae ae (an) Sant = gmt’) _ m+n 


i in es -me = Oeand n > 0, wehave 


151 


igo igl -1.™' yp = - = m'-n 
4 an ae — 1 n 


wl m' = 
a =(a ) sot NiGocmelle a (or (a) ifm enyeane 
eee ee oe = 
m' n 
n 
We similarly treat the case when m>O and n<0O. The equality (a) agg ie 


easy to prove using the first equality, just proved, and the definition of the powers of an 


element. faa] 


The simplest example of a cyclic group is the additive group of integers (Z,+, 0), 
which is generated by 1 orby -1. Also, it is easy to see that the matrix Is i! 
generates an infinite cyclic subgroup of SL(2,Z). The set {1,-1} under multiplication 
is a cyclic group of order 2. 

We can construct an example of a cyclic group of order n_ by considering all 
rotations of the plane around a point» O whichtake a regular n-gon a with center O 
to itself. These rotations clearly form a group; the group operation is successively 
performing the rotations. Our group Co contains the rotations Dy Parts Oy 


’ 


through the angles 0, 2m/n,---, (n-1)2m/n, counterclockwise. Here Oe or , and 
ae : : = n-s n ¥ 2 m 
it is geometrically obvious that w= ?) and ©, = 9%) the identity transformation. 
Thus, Ic | =n, and om = Cp)? . Note that the cyclic group C. is a proper subgroup 
of the group Dd. of all symmetries of the m-gon c (i.e. , rigid transformations of 
Pp) 5 

Again suppose that G is any group and a_ is an element of G. There are two 

aan F m n : 
possibilities: 1) All powersof a are distinct, i.e., m #n-2a #a_. Inthis case 
m 

we say that a has infiniteorder. 2) Wehave a =a for m ed Mi eehy, leet 2 in 
then Ae =e, i.e., there is a positive power of a which is the identity element. Let 
q_ be the least positive exponent for which al =e. We then say that a isan element of 


finite order gq. Ofcourse, if G has finite order (i.e., CardG < ), then all of the 


elements have finite order. 


Warning. The word "order" has many meanings in mathematics. Before we spoke 


of square matrices of order n (i-e., nxn matrices), but a non-singular matrix A, 


152 


considered as an element of the group GL(n, IR), also has an order (perhaps infinite) in 
the sense just defined. But it will always be clear from the context what meaning of the 


word "order" we have in mind. 


If we think of our example co of a cyclic group of order n, the following 


theorem becomes almost obvious. 


THEOREM 3. The order of an element ae G (where G is any group) is equal 


to Card(a). If a is anelement of finite order q, then 
Pl 
(a) 2 feowaeees ae and a See = be a tee 


Proof. If a has infinite order, there is nothing left toprove. If a has order 


. 


q, then, by definition, all of the elements a, a, a’, 000 ¢ A are distinct. We claim 
that any power a must coincide with one of these elements, i.e., (a) ={e,a,---, Ae 
To see this, we use the division algorithm in Z (see subsection 3 88 Ch. 1) towrite k 
in the form 

ee SAC te eee Tee a) ae dl 
Then, using the rules in Theorem 2, we obtain 


k g 
da (a’) ae bee ee 


In particular, ak SB@ea rea kava, ml 


The property of a group being cyclic is very useful. Given a group, we are not 
always told in advance whether it is cyclic; sometimes this must be proved. An example is 


the following 


PROPOSITION. Let G be any group, andlet a and b_ be two elements that 


and 


commute with one another. Suppose that a and b have finite orders s and tare 
a Ee POO See a peated se BS ee ES ESS a 


that s and t are relatively prime. Then a and b generate a cyclic subgroup of 


order st, and 


(a,b) = (ab) 


153 


Proof. First of all, D=(a)f <b) =e, sinceif de D hasorder q, then, 
by Theorem 3, 


i j s i j 
1 2 ey) = Pee) = ee Se ale 


and, since s and t are relatively prime, this means that q = 1. Next, if 
ne IC ab) | , we have (see relation (3) in §1) 


n 


n n = 
ab =(ab) =e=2a =b €D=esa =e,b =e 8|n,tln = 1Le.m.(s,t)|n = st[n, 


: t s 
Since st — ol cyim. (Sit) o.c.d.(s,t) = loc.m.(s,it). But ee = (a°) (b°) =e (using 


Theorem 2), sothat n Ist , andhence n= st. It remains to note that 
{a,b) = {a'p! |o < is See Oe ee ta = Cardia bh) iste 
and, since (ab) ¢ (a,b) and card (ab) = st, it follows that (a,b) = (ab). ic 
We shall return to cyclic groups, but now we examine a richer special type of group, 


namely, transformation groups, which will be used to illustrate the various group theoretic 


concepts we introduce. 


4, The symmetric group and the alternating group. Let Q bea finite set with 


n elements. Since we shall not be concerned with the nature of those elements, we may as 
well assume that Q = {1,2,---,n}. The group S(Q) (see Example 4 above) of all 


one-to-one correspondences 2-7Q iscalledthe n-th symmetric group (or the 


symmetric group on n elements), and is usually denoted Ss . The elements of Sy ; 
which are usually denoted by small Greek letters, are call permutations. 


Written out visually, a permutation m: i mi), i= 1,2,---,n, is represented 


as follows: 


where all of the images are given explicitly: 


154 


n 
by 
i 
n 


here i, = mk), k=1,---,n 


k » are the permuted elements 1, 2 


pope og ila AAS Wetell, 


e denotes the identity permutation (even though it is a Latin and not a Greek letter): 
e(i) =i, Vi. 
Two permutations 0,7 ¢€ x are multiplied by the usual rule for composing maps: 


(OT)(i) = o(T(i)). For example, if 


then we have 


1 
se 1234) {1234 4 
~ We Sl Wy et Ss @ iL i 
1 
Notice that 


Pome ey eee ea 
TAS es Nee 
sothat of # To, 


We now find the order of the group s . A permutation co cantake the element 1 
into any of n possible elements. Once o(1) is fixed, we can choose 9(2) tobe any of 
the remaining n-1 elements; thus, there are n(n-1) possible choices for the pair 
o(1), o(2). Then we can choose (3) tobe any of the n-2 numbers which have not 
already been taken tobe o(1) or o(2). Continuing in this manner, we see that the number 
of possible choices for the oa(1), o(2),---, o(n) is n(n-1)(n-2)... 3°2+1 =n! ("'n 


factorial"). Thus, 


Card Se Is. | = (Sse) = n! 


155 


The permutations in S can be decomposed into products of simpler permutations. 


We illustrate by drawing diagrams for the two examples o,T € S4 given above: 


We abbreviate the permutation o, which is a type of permutation called a "cycle" of 


length 4 (meaning that it "rotates'’ 4 elements), by writing o = (1234), or, 


equivalently, o = (2341) = (3412) = (4123). The permutation 7 is the product of two 


"disjoint" cycles of length 2, namely the cycles (14) and (23), and so can be written 


T = (14)(23). Note that = = (13) (24) , a =e, r =e. 


Returning to the general Bnoe, we calltwo elements i,j ¢ Q equivalent under a 
cyclic subgroup (7) of Sy (or simply #-equivalent) if there is an integer s for 
which j = (i) = Wooowinao le sil Sn is a finite group, all of its subgroups are 
also finite. By Theorem 3, if Card (7) =q, wemay take 0< s<q. This relation 
is in fact an equivalence relation, i.e., it is reflexive, symmetric, and transitive (see 


0 ung: k,, ; aki 
subsection 2 of 86 Ch. 1), since i=# i= e(i); j= (i) = i= (j) and 


Sou. 


i= n’(i) , k= n(j) > k=9 (i). By the usual property of equivalence relations, we 


obtain a partition 


= 1 
Q ere ao (1) 


of the set Q into pair-wise disjoint subsets (equivalence classes) Q, ies Q, » which 
are called -orbits. This name makes sense: every i ¢€ Q belongs to precisely one 


orbit, and if RQ is the orbit to which i belongs, then Q, consists of the images of 
2 ame 
i under the action of the powers of m:i, m(i),7 (i),-++, 7 (i) , where ce S la, | 
4 


kK, ; 
is the length of the m-orbit Q,. It is obvious that 4, <q = Card (m7); 9 (i) = 1; 
ae 
and an is the least positive integer such that # (i) = i. If we set 


156 


k 
iL sil it LAG) coe. fi (i) 
ts (in(i)... # (i)) = : 1} , 
m(i)w (i)... 7 (i) 
we obtain a permutation which is a cycle of length A . 
Sometimes for clarity we write the cycle (123...4) as (1,2,3,---, 2) ‘ 
separating the numbers by commas. The cycle ™ is the permutation which leaves all 


elements in the set Q\Q. fixed andtakes j to #(j) forall j « 2. . We therefore 


call Whe and m independent or disjoint cycles when s #t, since they effect disjoint 


sets of elements. Note that ™ =e. 


Thus, associated with the partition (1) we have a corresponding decomposition of the 


permutation into a product of disjoint cycles: 


TS 7 ae A (2) 
where all of the cycles commute with one another: # = un 7, Wo = ty Uh nce We 
p i, i, = 
For example, we may arrange the cycles so that 4 = “D CE ee on > ae meen Bs =] 


(i.e., the last p-m_ cycles correspond to orbits consisting of one element, so that as 
permutations fay Stee = oe =e). Since a cycle of length one acts as the identity, it 


is natural to omit such cycles in (2), and write 
WS wwe ove Ww 6 iL = Tl l <i <= im : (3) 


For example, we write the permutation 


12345678 
(ae Sg 


in the form 
Tea (12 3495) (6 78) = 2 oe 7) F (4) 


It may seem a little unpleasant that, for example, (12345) (6 7) can be considered 
as a permutation in Pe for any n = 7, since the total number of elements in the set is 


not indicated in the notation; but when nis fixed in a given context, there is no ambiguity. 


hey 4 


Moreover, we claim that the decomposition of a permutation into a product of cycles 
is unique. Suppose that we have another decomposition # = eS SF .-. @ into a product of 
15 


disjoint cycles, andlet i be an element which is not fixed by #. Then # (i) # i and 
s 


a (i) # i for one (and only one) of the a c and one and only one of the 


eae 
es a We have 


ni) = Wh) = a) , k= 0,1,2,... 
But a cycle is uniquely determined by the action of all of its powers on any one element 
which it does not leave fixed. Thus, Tee ae We continue in this way, using induction 
on m (or fr) to show that the cycles occurring in the two decompositions are the same. 


We have proved: 


THEOREM 4. Every permutation 7 #e in ee is a product of disjoint cycles of 


len > 2. This decomposition is uniquely determined except for the order of the 
ae This decomposition 18 uniquely determined except for Me orerores 
cycles. O 


This decomposition (3) is convenient for many reasons. For example, it makes it 


easy to find the order of a permutation. 


COROLLARY 1. The order of a permutation # «€ Sa (i.e. , the order of the cyclic 


subgroup ()) is equal to the least common multiple of the lengths of the disjoint cycles 


in the decomposition of 7. 


Proof. As noted before, the disjoint cycles in the decomposition of # commute 


with one another. Hence, by relation (4) in il 


s s s 
rT) Te ge O21, 2, 
Since the cycles @),°°+,W are independent (they act on disjoint sets 2 ies 2.) 
it follows that nize n -e for k=1,---, m. Thus, q isa multiple of all 


of the orders of the cycles Ms which, we have seen, coincide with their lengths 4 > ibe 


q_ is the least natural number for which nt =e, then gq = Card (m7) and 


158 


q = lc.m. ,:++, 4 ) is the integer defined in subsection 2 of §8 Ch. 1 (see also the 
m 
proposition at the end of subsection 3). q 
As an example, we can immediately say that the permutation (4) has order 10. 


As another example, suppose we wanted to know the maximum possible order of an element 


in Se ? If we go through all possible ways that 8 can be written as a sum of positive 


integers (in non-increasing order), we see that the following numbers occur as orders of 


elements # e€ in Se: 2,3,4, 5, 6, 7, 8, 10, 12, 15. An example of a permutation of 


maximal order 15 is (12345) (678). 
Definition. A cycle of length 2 is calleda transposition. 


A transposition has the form fF = (ij) ; it leaves fixed all elements besides i and 


j. Theorem 4 implies the following 
COROLLARY 2. Every permutation # ¢ Ss, is a product of transpositions. 


Proof. By Theorem 4, it suffices to write each cycle as a product of transpositions. 


But this can be done, for example, by taking 
(V2 at 2) = (IR f= 4) ay ol 


Corollary 2 can be expressed using the notion of a set of generators of a group (see 


subsection 2): 
s. SND Ea earl ities (23) yee (2) ee Cade) 
Of course, this set of generators is not a minimal set. For example, 
Sa = 4 (12)) (15), (23) = (012), (ish) 


Notice that we cannot hope for any uniqueness assertion about expressing a permutation as a 
product of transpositions: in general, transpositions do not commute with one another, and 
the number of transpositions which appear when we write a permutation as a product of 


transpositions is not fixed. For example, in 5, we have 


(123) = (13) (12) = (23) (13) = (13) (24) (12) (14) 


159 


In fact, the non-uniqueness of the transposition decomposition is immediately clear if we 

2 By : 
note than oT = c forany transpositions 6 and Tf. Nevertheless, there is one thing 
about the transposition decomposition which is fixed. In order to discover it in the most 


natural possible way, we consider the action of S_ on functions. 
n 


Definition. Suppose that # ¢€ Sy and (X,,; coe y x) is a function of n 


variables. Set 
Goi oe = oe ? (5) 
} 2 Ry n(n) 


We say that the function g = 1 of is obtained by letting # acton f. 


: BD 3 
For example, if # = (123) and {(X,,X,,Xq) = Xx, + 2X, + 3X, , then 
3 


2: 
eye et 
Asin §1 of Ch. 3, a function f is called skew-symmetric if tT ef= -f for 


every transposition T, i.e., 
oon 4 fanned Sayooe) = 7 Woon 5 wap one 5 ooo) 
LEMMA. Let @ and B be permutations in sie Then 
(GB) of = aa(h =! 


Proof. Using the definition (5), we have 


8) = 9 Oy one HY = ae ae 
— : ‘ (wp) *(1) (ap) \(n) 


- f(X iz . acon os = = )- 
( (ea ya) (ha yn) 


B (a1) B (an) 


Cy 


- eo (x oy Reta e e ey )= oom, ¥) 
a (1) a (n) 


THEOREMS. Let # bea permutation in So , and let 


160 


uw S 8 UT. ooo © (6) 
be any decomposition of # into a product of transpositions. Then the number 
WSRAIVACE COMPO SICLONRCE a a 
k 
ee) (7) 


which is called the "parity" (or "signature" or “sign’) of 71s completely determined by 


f, and does not depend on which decomposition (6) is used, i.e., the parity of k is 


always the same fora given #. In addition, 


€ = € € (8) 
for all a,Bes. 


Proof. Take any skew-symmetric function f in n variables x 9 989 5 x o Ih 
the lemma, the actionof m on f reduces tothe composition of the actions of the trans- 


positions T i.e., to multiplication by -1 k times: 


ped “Page °° 9 


Since the left side of this equality depends only on (not on what choice of decomposition 
(6) is taken), it follows that the map ¢:7 & ce defined by (7) must be completely 
determined by #, provided, of course, that f is not the identically zero function. But 
we know that there exist skew-symmetric functions which are not identically zero; for 


example, the Vandermonde determinant A(X 9880 5 x) is such a function. 


Next, if we apply the permutation @B to f and use the rule inthe lemma, we 


obtain 


Bayes = (@B)of = wo(Bof) = ee) = Bee = aac = (a a) : 


which gives us (8). al 
Definition. A permutation # ¢ sh is called even if ee 1 and odd if ae tle 


Thus, every transposition is an odd permutation. 


161 


COROLLARY 1. ‘The even permutations in S forma subgroup A_. of order 
Ere sea ee eee ae Ot aoe sOup A eee 


| 
AU! = nl/2 (called the alternating group on n_ elements). 


Proof. By (8), we seethat e¢ ~=1 if € =e,=1, and ¢ = € , since 
——— 7 


ap a B y = 


" 


(3) 
ll 
e 


It is then easy to see that all of the group axioms are fulfilled in A_. 
n 


We write s, as the disjoint union so = AY U AN » where A_ is the set of all 
n 


odd permutations in s . The map of 5s to itself defined by the rule 


P12) mt (12)97 


is bijective. (Namely, it is injective, since (12)a = (12)B = a = B; then apply 
Theorem 3 of §5 Ch. 1. Or else, simply observe that Gna is the identity map, from 


icnepiieceivite f } ; —— z= 
which bijectivity follows. ) Since e, » we have Pio) AG 


fia)" ~ §(12) "9 


=A 5 {0 A. = <A_. Thus, the number of even permutations in S_ coincides with the 
jay COLA) Sal n n 
“4 | 1 nt! 
number of odd permutations; hence, |A_ | = 3 Is | Se 6 ial 
n 25 2 


length et eee . Then 


kes 
@ = ap 
= = eee € fe 1 ha 
Proof. By Theorem 5, we have Ss See oF - We also have 
1 m 1 m 
4.71 
e = (©) , since ™. can be written as a product of AL -] transpositions (see the 
%. 
k 


proof of Corollary 2 of Theorem 4). We conclude: 


m 
ee £ -l = Cre) 


ety eee) (-1) Oo 
bag 


i 


We end this section by taking a break from serious things and considering the game 


"fifteen", Fifteen numbered flat square markers, all of the same size, are placed on a 


162 


square board so as to occupy all but one of 16 squares of the same size. The free square 


can be used 


(b) 


Fig. 12 


to move the markers horizontally or vertically (without lifting them from the board). Given 
an arbitrary set-up for the markers (see Fig. 12(a); we may suppose that we start with the 
free square in the lower right corner), we are required to move them to the Set-up in 

Fig. 12(b). When is this possible? Elementary group theory killed this game when it was 


at the "height of fashion". Namely, we associate a permutation 7 ¢€ s5 to the diagrams 


5 
12(a) and (b). It is not hard to see (and it is worthwhile to really convince yourself of 


this) that it is possible to move the markers to the desired Position if and only if the parity 


ee of the permutation 7 is 1, ie., if and only if © Aus 2 


EXERCISES 


1. Show that if M = ¢S)_ is the monoid generated by a set S, and if every 


element s ¢S has aninverse in M ; then M_ isa group. 


2, Prove that a group isa monoid G in which all 


equations of the form ax=b or ya =b havea unique solution, if a,beG. 


3. Show that the set A,(R) of so-called affine transformations pi max+b 
Se LOE MaGLons) 
bf 
(a,b e IR; a #0) ofthe realline R_ isa group with multiplication law © bea? 
a,b "ec 
= Pac, adt+b . The group A, (CR) contains the subgroup GL(1, IR) of affine transfor- 


mations which leave the point x = 0 fixed, and the subgroup of "translations" x » X+b. 


163 


4, The group SL(2, Z) contains the elements A = (| if ol and B= {| i 
of orders 4 and 3, respectively. Show that (AB) is an infinite cyclic subgroup of 


SL(2, Z). This shows that the product of two elements of finite order in a group is not 


necessarily of finite order. What happens in an abelian group? 


5. Prove that a group G_ of even order Ic| = 2n must contain an element 


SG Gil oictclee 2, 
6. Prove that Sa = (A), sy goee p (abay)y 3 
7. Prove that Sn =e (CL2) Fe (IS 4a nl), 


8. Prove that the alternating group AD , n> 3, is generated by cycles of 


length 3; more precisely, prove that 
AD = ((123), (124), +--+ , (12n)) 


9, Find the sign of the permutation 


oe ee es 
Ce eon ene Me |) 
(OM bere Qo— (lent eramd ete (Gem) belthe cartesian square. (Calla 


pair (i,j) « Q xX Q an inversion relative to the permutation © € sh (or simply a 


o-inversion) if i < j but c(i) > o(j). Set 


sgn go = Il aaa) ; at) 
pean 


Since (a(j) - o(i))/(j - i) is a non-zero rational number which is negative if and only if 
(i,j) isa o-inversion, and since 0:9 7 Q is a bijective map, it follows that 
seno = (- ie , where k is the total number of o-inversions. If 7 = (ij) isa 


transposition, then sgnt = -1. It is easy to see that 


ve Jj 


Fae Does 1 CaaS Re te “yeas Vere My. 
Ee = eee ei) eo) ca.) = es oj)... oi)... 


1 


J 


i 


164 


sothata o-inversion (ij) is not an inversion relative to the permutation To, where 
T is the transposition (o(j) o(i)). Show that it is possible to find k transpositions 


T, such that TC. eS) thesidentityess Hhensc ar 


a k Giese 2) 


Tene fe Tie ee 

and sgno = (- is = € _, so we have two designations for the same thing: sgn and e¢. 
of 

This gives us another useful method for determining the sign of a permutation. Suppose that 

the set of inversions relative to a permutation # consists of five pairs Cl) (55) 


(3,5), (4,5), (6,7); then sgngw = -1. In practice, this method amounts to counting the 


number of j in the lower row of a permutation which are greater than i but come before 


ll. Prove that a non-empty subset H_ of a finite (multiplicative) group G isa 
subgroup if H_ is closed under multiplication. That is, in that case the requirement that 


H_ has an identity element and an inverse hos for each h e€ H_ is superfluous. 


12. Give a possible set of generators for the multiplicative group of positive rational 


numbers. 


k 
13. Prove thatthe k-th power # of the GQVale i = IA wa, i) s. is the 


product of d = g.c.d.(n,k) independent cycles, each of length q = l.c.m. Gil) = mya) 


14. Suppose that A,B e M,(R) and (AB) = E for some integer m. Is it 


necessarily true that (BA) = E? 


$3. Morphisms of groups 


1, Isomorphisms. As already noted, the three rotations Py» counter - 


i 2 
i) 


fe) ; ; 
, 240° take the equilateral triangle Pe to itself. But there 


are also three reflections about an axis of symmetry, which we denote aT ; vo ; v3 5 iin 


Fig. 13, the axes of symmetry are 1--1', 2--2' , and 3--3'. For each of the six 


clockwise through 0° , 120 


transformations of P. there is a corresponding permutation of the set of vertices of the 


triangle. We have the correspondence 


165 


, ~ (23), hy ~ (13), by ~ (12) 


Since this exhausts all of S3 » We can say that the group Dz of all symmetry transfor- 

mations of an equilateral triangle is very much like the symmetric 

group S3 
In the same sense, the two groups os (the cyclic group of 

order n, see the example in subsection 3 of §2) and 


CAs oof) S on are very Similar to one another. These 


examples, along with some genera) thought about the nature of 


groups, inevitably lead to a very natural question about the most 


Fig. 13 
essential properties of groups. At‘first glance, complete information is contained in the 


multiplication table for a group G, sometimes called the Cayley table: 


And, in fact, many properties of a group can be seen from the Cayley table, i.e., 
from the nxn-matrix M = (m;,) > where n = (G:e), with entries a = 8.8; eG. 
We note, for example, that in each row or column of M every element of G_ occurs 
exactly once (see the proof of Theorem 2 below). A group G is abelian if and only if M 
is symmetric, i.e., Pe = i . There are other such properties, but we soon see that it is 
rather hard to compare two multiplication tables, say for groups G and G', because the 


appearance of M depends on how theelements of the group are indexed; moreover, the 


situation becomes more complicated when the groups are infinite. 


166 


The best and most profound approach to comparing two groups is based on the notion 


of an isomorphism. 


Definition. Two groups G and G’ with operations * and 
isomorphic if there exists a map f:G-—- G' such that: 
(i) f(a *b) = f(a)°o f(b) forall a,beG. 


(ii) f is bijective. 


Such a map f is called an isomorphism between G and G'. 


G=G' when G and G' are isomorphic. 


We give the simplest properties of an isomorphism. 


. 


° are called 


We use the notation 


1) The identity goes to the identity. To see this, let e be the identity of G. 


Since e*a=a+e=a, wehave f(e)°o f(a) = f(a)of(e) = f(a) forall ae G; hence, 


f(e) = e' is the identity in G'. Note that, in addition to property (i), we have used 


property (ii), since surjectivity of f is needed to be able to write any element of G’ 


in the form f(a). o 


heat) aie icie se Namely, by 1), we have f(a)o (a1) = fax a) = f(e) = e' 


where e' is the identity in G'. Thus, 


fla =e t(aji> cre emiay  eh(aeait( 1)) = 


= (fa) ) © sa) « fat) = ef 0 a7} 


= f°) , im 


: al ; : , 
3) The inverse map f :G'=G_ (which exists by property (ii)) is also an 


isomorphism. 


By the corollary of Theorem 2 of §5 Ch. 1, we need only verify property (i) for 


al 


{ . Let a,b €G . Then) since f “is bijective, we can write a' = f(a), b' = f(b) 


for some a,beG. Because f isan isomorphism, we have a'o b’ = f(a)o f(b) = f(a *b). 


2. all ' ' = eel ' e 
This gives a*b=f (a'ob'), and, since a =f (a') and b-=f HO) , it follows that 


fi (Glob) =e Guyer 1c) ae 


167 


A simple verification shows that our correspondence ~ between the groups D, 


and S3 in the example above is actually an isomorphism. 
The function f = 1n is an isomorphism of the multiplicative group of positive real 
numbers with the additive group of all real numbers. The basic property of the logarithm 


Inab = Ina + Inb is precisely property (i) in the definition of an isomorphism. The 


: Re 8 x 
inverse mapping is x be 


We now prove two general theorems which illustrate the role of isomorphisms in 


group theory. 


THEOREM 1. Any two cyclic groups of the same order (in particular, any two 
infinite cyclic groups) are isomorphic. 

Proof. First, if ¢g) is an infinite cyclic group, then all powers 2 are distinct, 
and we obtain an isomorphism f:¢g) ~ (Z,+)_ by setting g” Hn. f is obviously 

Aa jee fol m n 
bijective, andthe property f(g g) = f(g) + f(g ) follows from Theorem 2 of 82. 
=A 1 1 1 t ol 

Now suppose that G = {e,g,---, g) }) and @ =e he rene gt "} are two 

cyclic groups of order q (where we are using multiplicative notation for the group 


operation in both groups). We define a bijective map by setting 
k als 
tee (SS (ey). ie = jena 5 @ ol 
Setting n+m-=fqtr, 0O<r<q-1, forany n,m=0,1,---,q-1 and reasoning 
as inthe proof of Theorem 3 of §2, we have 


(ete eG) =e) | =e) = te ie el 


THEOREM 2 (Cayley). Any finite group of order n is isomorphic to some sub- 
group of the symmetric group s 5 

Proof. Let G bea group with n = le| . We may take So to be the group of all 
bijective maps of the set G_ to itself, since the nature of the elements permuted by the 


elements of s is immaterial. 


168 


For any ae G, consider the map L, :G —~ G_ given by the formula 
L(g) = ag 
if C= Br Bort s Be are all elements of G, then a,4B,,°**, ag. are the same 


elements but in a different order (recall the Cayley table!). (To see why these elements 


are distinct, we have: 


-1 -1 -1 -1 : 
ag, = 38, =a (ag,) =a (ag,) = (a a)g, = (a ayg; = os ) 


il 
Hence, L_ isa bijective map (permutation), whose inverse will be L, =L aps Of 
a 
a 


course, Li is the identity permutation. If we use the associativity of the group operation, 


we obtain: L (8) = (ab)g =‘a(bg) = L (L,8) 5 nis 5 Lip S L, ° Ly A 


Thus, the set L,; L see De forms a subgroup -- call it H -- inthe group 
2 n 


of all bijective maps of G_ to itself, i.e., in 5S . The group G_ is isomorphic to the 
subgroup H_ using the correspondence ab L, , Which, by what was said in the last 


paragraph, has all of the properties of an isomorphism. (el 


Despite its simplicity, Cayley's theorem has an important meaning for group theory. 
It shows the existence of a sort of "universal object" -- the family {s_|n Seas fee 
symmetric groups -~ in which all finite groups (considered up to isomorphism) live. The 
phrase "up to isomorphism" is so typical, not only of group theory, but of all mathematics, 
which tends to consider at once all objects having common properties; without such 
abstraction and generalization, the whole point of the subject would be lost. 

lf G' =G_ inthe definition of an isomorphism, we have the concept of an 
isomorphism »:G~G ofagroup G to itself. Such an isomorphism is called an 
automorphism of G. For example, the identity map €q :g&g (henceforth denoted 
simply 1) is an automorphism. But, in general, a group G_ also has non-trivial 
automorphisms. Property 3) of isomorphisms shows that the inverse of an automorphism 


is also an automorphism. Furthermore, if «» and y are automorphisms of G, then 


169 


(p ° b)(ab) = w(u(ab)) = w(¥(a) A(b)) = (@ od) (a)* Mog) (b) for any a,beG. Hence, 
the set Aut(G) of all automorphisms of a group G formsa group, in fact, a subgroup 


of the group S(G) of allbijective maps G-G. 


2. Homomorphisms. The group of automorphisms Aut(G) ofa group G 
contains a very special subgroup, which is denoted Inn(G) and is called the group of inner 


automorphisms. Its elements are the maps 
= 
I 7g aga 
A simple exercise shows that L really has all of the properties required of an auto- 
eal 
morphism d that = = i i i i = 
phism, an a I I I, 1 is the identity automorphism, and Le I, = Lb 


=]? 
a 


(because 
(L, °.)(g) = 1, (g)) = L(bgb”) = abgb"1a”! = abg(aby = 1 Ge) .) 
a i a i a ab ‘ 
This last fact about I shows that the map 
£aGesinn(G)e. fa) = L for aeG , 


from the group G tothe group Inn(G) of inner automorphisms of G_ satisfies 


property (i) in the definition of an isomorphism: f(a) f(b) = f(ab). However, property 
=r 


(ii) is not necessarily satisfied. For example, if G is an abelian group, then aga g 
forall a,geG, ie., L = I, forall aeG, and Inn(G) only consists of the 


identity I . The example of this map f makes it natural to introduce the following 
e 


Definition. Amap f:G-G' fromthe group (G,*) tothe group (G',o) is 
called a homomorphism if 
f(a * b) = f(a) © f(b), Va,beG 
(in other words, property (ii) in the definition of an isomorphism is omitted). 


By the kernel of the homomorphism f, we mean the set 


Kerf = {geG|f(g) = e', where e' isthe identity of G’} 


170 


A homomorphism from a group to itself is called an endomorphism. 


In the definition of ahomomorphism, f need be neither injective nor surjective. 
We can make f intoa surjective map by replacing G by ImfCG', which is obviously 
a subgroup of G'. So the "main" difference between a homomorphism and an isomorphism 
is the presence of a non-trivial kernel Kerf, whichis, one might say, a measure of the 
non-injectivity of f. If Kerf = {e} ,» then f£:G-~ Imf is an isomorphism. 

Note that 


f(a) 


e', fb) = e' = f(axb) = f(a) o f(b) = e' oe’ = e' 
and 
"WG = We) = Gh = @ 
Hence Kerf isa subgroupof G. Let H = Kerf@G. Then (we are now 
omitting the * and 0°): 
al ol ; ail ; 
tehe P= ie) thie) = (ee f(s) wel VheH,geG , 
, ell ell il 
i.e., ghg ¢ H; hence, gHg CH. Ifwereplace g by g here, we obtain 
-l - 
g HgcH, sothat Hc gHg ae Thus 
sip =H, Wee 


A subgroup which has this property is called a normal subgroup (sometimes an invariant 


subgroup or a normal divisor). We have thereby proved 


THEOREM 3. The kernel of a homomorphism is always a normal subgroup. OD 


We shall see the importance of this fact somewhat later. For now, we note that far 
from every subgroup is normal. For example, in Sq the cyclic subgroup ((123)) = Ag 
is normal, but (¢(12)) = {e, (12)} is not normal. 


3. Glossary, Examples. The terms "surjective map" (map "'onto"), "injective map" 


(imbedding), "bijective map" (one-to-one correspondence), which can be used for maps of 


any sets (with or without operations), are often replaced by other terms when used for groups 


val 


(the same happens for other algebraic systems), We use the terms epimorphism (homo- 


1 


morphism “onto"’), monomorphism (homomorphism whose kernel is the identity element), 
and isomorphism (a one-to-one homomorphism, i.e., a homomorphism which is both an 
epimorphism and a monomorphism), There is a tendency to replace the term homomorphism 
by the word morphism. These words are useful to know when reading mathematical 
literature, but the reader can, if he chooses, get along using only the terms isomorphism 


and homomorphism with the prepositions "into” and "onto", 


We now give some further examples of group homomorphisms. 


1) The additive group of integers Z% maps homomorphically onto the finite cyclic 
group (g) oforder q ifweset f:nb g" (see Theorem 2 of §2). Here 
Kerf = {4q|4 « Z}. Namely, it ig clear that {4q}c Kerf, and the reverse inclusion 


follows from Theorem 3 of §2. 


2) The map f: R 7 T = SO(2) of the additive group of real numbers onto the 


group T of rotations of the plane about the origin, which is given by f(A) = $, @ is 
counterclockwise rotation through an angle of 2#}) is ahomomorphism. Since rotation 
through a multiple of 2m coincides with rotation through an angle of zero, i.e. , the 
identity map, we have: Kerf = Z. We also saythat f gives ahomomorphismof IR 


onto the unit circle s! , Since there is a one-to-one correspondence between SO(2) and 
st » hamely, ¢, corresponds to the point with polar coordinates (1,27). 

3) The general linear group GL(n), consisting of matrices A with entries in 
IR and non-zero determinant, maps homomorphically onto the multiplicative group IR* 
of non-zero realnumbers, if we set f = det. The condition that f is a homomorphism: 
f(AB) = f(A) {(B), is merely another way of stating Theorem 5 of §2 Ch. 3. By definition, 
SL(n) = Kerf. 

4) Consider the cyclic group Cy = (=) ={1,-1) oforder 2. If we want, 


we can give this group by writing its Cayley table: 


172 


The map sa > C, given by our function ¢ = sgn: 7b ee is a homomorphism. Here 


Ker € = a , by the definition of the alternating group. 


5) An infinite group can be isomorphic to a proper subgroup. For example, the 
additive group (Z,+) contains the proper subgroup nZ = {nk |k ez), where 1 > J 
is a fixed natural number. It is easy to check that the map By! Z-nZ given by 
g (Kk) = nk isan isomorphism. Incidentally, note that Z and nZ are infinite cyclic 
groups with generator 1 or‘-1l and n or -n, respectively; hence B. and the 


map k®-kn areall possible isomorphisms Z~nZ. 


6) The group Aut(G), and evena single element  € Aut(G) which is not the 
identity, can be a source of important information about the group G. Here is an example 
of this. Let G bea finite group having an automorphism « oforder 2 (i.e., 

a 


9 = 1) which has no fixed points: 


ae 


=| -1 
Suppose that (a) a ~ = (b)b forany a,beG. If we multiply this equality on the 


-1 - - 
left by cp(b) and on the right by a, we see that ¢(b) i (a) = b “2 9 tha Bog 
mil all ech F el 
p(b a) = b a, whichimplies that b a=e, andso b=a. Thus, as a _ runs 


= 
through the elements of G, so does (a)a ; equivalently, any element g e« G canbe 


written inthe form g = maya. . But then (g) = »@(a)) nea = ata) me) = AGES) = 


-l-1 -1 ; : = 
=(@ (aja ) =g . Thus, » ispreciselythe map gg. With this in mind, we 


; -l, 71 -1.- -l,-l- 
obtain ab = o(a_ )@(b ene 0) eee om eis 


, i.e., the group G turns 
out to be abelian! In addition, (G:e) is an oddnumber, since G consists of e and 


disjoint pairs of elements Sis o = (g;) : 


7) The following example shows how much one can alter the group operation without 


173 


changing the group itself, i.e., only changing it to an isomorphic group (see also Exercise 3 
of §1). Let G_ be any group, and let t bea fixed element of G. Introduce a new 


operation on the set G: 


(g,h) /}— > g«h = gth 


We immediately verify that (g, * 8) #5 = 2 (g. * 83) » ie., the operation « is 


associative. In addition, g * te = e- *g=g, and gx eS ee 3 = 
=i et il al : : 
(t g ot )*x g=t ; this means that (G,*) isa group with identity element 
Se! ee ean a 
e, =t . Theinverse ofanelement g in (G,x) is &, =t g t . The map 


mil 
Pee ot gives an isomorphism between (G, +) and (G, +). 


All of these examples illustrate the general principle that studying the morphisms of 


a group G_ gives much information about G_ itself. 


4. Cosets of a subgroup. It is clear from the definition of a homomorphism 


f:G-G' that all of the elements of the set 
a Kerf = {ab|b « Kerf}, bet . 


are mapped to the same element f(a) in G’: f(ab) = f(a) f(b) = f(a)e’ = f(a) if 

ih =a Sil 
be Kerf. Conversely, if f(g) = f(a), then f(a g) = f(a )f(g) = f(a) f(g) = e', so 
that cae = be Kerf and g = abeaKerf. This fact shows the usefulness of 


partitioning G into subsets of the form a Kerf. We now forget about homomorphisms 


for a moment, and study such partitions in their own right. 


Definition, Let H bea subgroupof G. A left cosetof H in G isa set of 
the form gH, consisting of all elements of the form gh, where g_ isa fixed element 
of G and h_ runs through all elements of the subgroup H. The element g_ is called 
a coset representative for gH. 

We similarly define a right coset Hg. (Sometimes the terminology "right" and 


"left" is reversed, i.e., used to refer to the position of H rather thanof g; the 


174 


important thing is to be consistent, whichever convention is adopted.) If H = Kerf is 
the kernel of a homomorphism, then gH = Hg, because H is normalin G (see 
subsection 2). Note that the subgroup H _ itself isa coset: H = He = eH. But none of 
the other cosets can be a subgroup, because, if gH were a subgroup, then we would have 


e¢€ gH, so that ere and gH=h H=H. 


THEOREM 4. ‘Two left cosets of H in G must either coincide or be disjoint. 


The partition of G intoleft cosets of H gives an equivalence relation on G. 


Proof. Suppose that the cosets 8H and 85H have an element in common: 
el 
a= gh, = 8h, . Then 85 = g,)h,h, , and any element Soh of the coset 8H has 
-1 8 or: -1 
the form g,h,h, h = gh ,» Where h' = hh, heH. Thus, 8,HC gH. We 
similarly prove that every element in 8H is contained in 85H erences gH = 8H : 


Since each element g ¢ G is contained inthe coset gH, we may conclude that 


G isa union of disjoint left cosets of H: 
G=U gH 


By the general principle in §6 of Chapter 1, this partition induces an equivalence relation 


on G, which is defined in the obvious way: 
ell 
ae |) => A ~b @ il 


If we want, we can verify reflexivity, symmetry, and transitivity of this relation directly: 


1 - 
a~ a, because a en as yee a=h eHsebnoa; 


eels 
es |), lbs CS Ip a=h,,c b=h, sc a=c Se ce 0 


The analogous theorem holds for right cosets. 

The partition into cosets arises naturally in permutation groups. Let G = s be 
the symmetric group, acting on the set © ={1,2,--.-,mn}. If we consider the set H 
ofelements # « s for which #(n) =n, then we easily see that H isa subgroup of 


o , which can be identified with Sil a juste Ty =e, and let Cie (in) be the 


transposition taking n to i (i= 1,2,---,n-1). It is clear that 
inl = I 
Ss = WwW was 
n Leo keen) 


Here is the partition of S, into left and right cosets of the subgroup ((12)) = S 


3 2° 


ie, 
I 


3 = te, (12)} U {(13), (123)} U {(23), (132)} 5 


ie, 
tl 


3 = te, (12)} U {(13), (132)} U {(23), (123)} 


We see that the set of left cosets 8S. is not the same as the set of right cosets 558" ” 
Nevertheless, the sets {gH} and {Hg'} are always in bijective correspondence, as 
follows: 


x = phe gsige—— re = fe € ais 


: =il -1 
In fact, if, say, h =h 8 > then Ens gh, h and gH = g.,H. In particular, 


1 15 ene? i 2 
ie 1@,%,¥,2,2--} isa set of left (respectively, right) coset representatives, then 
salen) Seal 


mk 5h ge ys } is a set of right (respectively, left) coset representatives. Both 
sets have the same cardinality. 


We denote the set of all left cosets of H in G bythe symbol G/H (or (G/H), 
if we have to consider simultaneously the set (G/H) of right cosets). We call the 
cardinality of this set CardG/H the "index of the subgroup H in G", which we denote 
(G:H). (This agrees with the notation (G:e) introduced before for the order Ke of 
G, i.e., the number of cosets of the trivial subgroup {e}.) Since we have the one-to-one 
correspondence aga between H and gH (see the proof of Cayley's theorem), it 


follows that Card gH = (H:e). We thus have the following simple formula: 
(G:e) = (G:H)(:e) , 
which implies the classical 


THEOREM 5 (Lagrange). The order of a finite group is divisible by the order of any 


subgroup. oq 


176 


COROLLARY. ‘The order of any element divides the order of the group. A group 
whose order isa prime p is always cyclic; such a group is unique up to isomorphism. 


Proof. The order of anelement g ¢ G_ is the same as the order of the cyclic 
subgroup generated by g (Theorem 3 of §2), and so divides Ic| by Theorem 5. Next, 
if lc | =p isa prime, andif H isa non-trivial subgroup (i.e., H #4 {e}), then 
Theorem 5 implies that lH| =p, andhence H=G. Thus, G_ coincides with the 
cyclic subgroup generated by any element g # e. Since all cyclic groups of a given order 


are isomorphic (Theorem 1) , we have the uniqueness assertion in the corollary. 7 


Lagrange's theorem leads one to want to finda subgroup of order m forevery m 
dividing n= ke . But this Meares always be done. For example, the reader can verify 
that the alternating group A, , which has order 12, hasno subgroup of order 6. 

But in some groups the "converse Lagrange theorem" does hold. For example: 

THEOREM 6. Every subgroup of a cyclic group is cyclic. The subgroups of the 
infinite cyclic group (Z,+) are precisely the (infinite cyclic) groups (mZ,+), meN; 
and the subgroups of a cyclic group of order q are in one-to-one correspondence with the 


(positive) divisors d of q. 


Proof. Let A = (a) beacyclic group. For variety, let's use additive notation. 


Thus, every element has the form ka, where ke Z orelse k=0,1 


pllgpeed, Gell ie 
A isa finite group of order q (see Theorem 3 of §2). Let B bea non-zero subgroup 
of A. If ka e B forsome k #0, thenwealsohave -ka « B. Among all elements 
ka ¢€ B_ with positive k, let ma be the element for which m is minimal. If we 
write any k>O intheform k=4m+r,0<r<m, weseethat ka eB implies 
ra = ka -£(ma) « B; hence, r=0. Thus, B= (ma) isa cyclic group. 

AN) infinite cyclic groups are isomorphic (by Theorem 1). So without loss of 


generality we may take (Z,+) to be the model of an infinite cyclic group. It has generator 


1 or -1, so that, by the last paragraph, any subgroup of (Z,+) is determined by a 


177 


natural number m_  andhas the from 
eo, = Cid Pi} = 105 +m,+2m,---} 


It is obvious that all of these subgroups are infinite. 
Now suppose that (a) = {0,a,---, (q-1l)a}, qa = 0. We found that 
B= {0,ma,2ma,-.--}, where me N and sae B for se N onlyif s isa 


multiple of m. Weclaimthat m divides q. In fact, let q=dmt+r, O<r<m. 


Then 

O = qa = d(ma) + ra , 
sothat ra = -d(ma)e¢ B. Since m is minimal, wehave r=0, andso q=dm. 
Thus, 


B = {0,ma, 2ma,---, (d-1)ma} = mA 


isa subgroup of A oforder d. As m_ runs through all positive divisors of q, so 


does d, and we obtain exactly one subgroup for each order d_ dividing q. fl 


COROLLARY. Inacyclic group (a) oforder q, the subgroup of order d Ip 


is precisely the set of elements be (a) suchthat db=0. 
Proof. If dm=q, then be B=mA, and db=0. Conversely, suppose 
that b=4aea) and db=0. Theconditio dé4a = 0 implies that 


df =qk=dmk, sothat 4£= mk, and b =a = k(ma) « mA. im 


5. The monomorphism s -~ GL(n). Recall that a monomorphism of groups 
G-G' is aninjective homomorphism from G to G'. 


For example, here is a monomorphism f: S, = GL(3): 


178 


ii @ i) © @ il 
e > |o 10}, (2) > f1 0 of, say L fo 1 of, 

mo oo 1 0 of 

i ie ow 010 
(23) --> flo 0 1f', (123) E> Ja o of , (32) RH fo 0 1 : 

010 on 2 0 100 


The reader can check that f is really a monomorphism, and that for each # e€ S3 the 


determinant of f(7) is +1, depending on the signature of the permutation 7. 


THEOREM 7. There exists a monomorphism f:S —~ GL(n)_ such that the matrix 
Petes ta acd hatcneabeda het aetate Cet ia BE eee ees 


f(m), ® € su? has determinant le(n) | ee 


Proof. We shall write an nxn matrix (a, as a sequence of columns: (a, = 


Oe 


. 


(n) . 
-, A’). In particular, let 


nee 


1 6) 0 
0 i 0 
ae Il : Bi) _ | Ce pM) _ 
lo 0 1 


be the columns of the identity matrix E. We define the map f: s ~ GL{n)_ by setting 


wi fin) = (aPC) yO) rem), es 


, 


Thus, f(r) isan nxn _ matrix in which each row and each column has one 1 andthe 
other entries zero, It is easy to see that f(r) « GL(n). 

Let o and # be any permutations, and let m =o7 be their product. By 
definition, the non-zero entries inthe i-th rowof f(a) = (a, and inthe j-th 


column of f(r) = (b, .) are a = 1 and b 


k£ i,o ‘() T(j),J 7 


1, respectively. Thus, in 


the matrix f(c) f(r) = (oi) the condition ci # 0 is equivalent to a tay = iG), 6a, 


i = or(j) = m(j); but this means that f(c) f(r) = flor). Consequently, f isa 
homomorphism. 
The property Kerf ={e} is obvious, since it is clear from (1) that f(7) = E = 


=m=e. Thus, f is a monomorphism. 


179 


Einally. since ef5) det, “and € all preserve products, i.e., 
le¢ot)| = |£) £(t)| = Je(o)| leq], SS. =e. & 


and since any 1 can be decomposed into a product of transpositions, it 
suffices to prove the equality |e(m) | = €, when 7 = (ij) isa 
transposition. But then f(7) is obtained from E by interchanging the 


0 


i-th and j-th columns; hence, |£(m) | = -|E| =-l= eae 


Matrices of the form f(m), 7 € sy ; are called permutation matrices. The 
restriction of the monomorphism f to Ay is a monomorphism into SL(n, IR). Given 
any finite group G, the composition foL ofthemap L:G~ s (see Theorem 2) 


and the map f: s ~ GL(n) gives a monomorphism G ~ GL(n). 


Using Theorem 7 , we can easily prove the so-called theorem on the complete 


expansion of the determinant: 


THEOREM 8. The determinant 


ll Wy ln 
a a a 
2 2 
Hee ee 21 2 n 
ang aan 


can be written asa sum of n! terms, each of whichis a product of n entriesin A: 


= ete : 2 
deta = DY Hn A n(1), 1 2m(2),2 °°" “m(n), 0 (2) 
meES 
n 
BEOOf met Jal) tee y a) pb | ey cee, al) | denote the determinant 


: ; i 
obtained from [A | by replacing the column ad) with the i-th column pf ) of the 
identity matrix. Using the formula for expanding a determinant alongthe j-th column, we 
see that the cofactor 2 of the element 2 in [A | can be written as the following 


n-th order determinant: 


180 


Q) GD 4) G41) 


ph) ne a 


? 
and, so, expanding la | along the j-th column, we have: 


. » (1) G-1) 2G) G41... (nm) 
tay - le yrrey A ne 5 A 6 5 & | 


If we first do this for j = 1, and then evaluate each of the n determinants in the sum by 
repeating the method for j = 2, and soon, we end up with an expression for det A 


2 
containing first n, then n ,---, finally n determinants: 
(i,) 
2 
cy ae lie YA) eae = 


els, 
i, 1 


1 2 
(i,) (i,) 
a y a Be |E ME ie ne = 
i thee 2 : 
iit egel ieee, 
iY 2 
if (i,) Gy) (i) 
= eee = Sy a 134 grt 98s l= we 5 0ec g IB | 
i LY DP int’ 


ihe eke Aci 
ee an 


Here i, xun through all sets of n numbersin {1,2,---, n} (with 


i, ’ i, Soar es 
repetitions allowed). In other words, taking all n maps 7:{1,2,--- ,n}- 


= {1,2,...,n} (see Example 1 in subsection 2 of ei), where ‘e(l)/=197.., oi) —1 
n 


ier 


we write det A inthe form 


- (m(1)) _(7(2)) (m(n)) 
det A = 2p Shh ID ae ernie E JE ees | 
T 


It remains for us to note that, if m(i) = 7(j) for any i #j, then the determinant 


ml mn ; p 
in! ( ») ree, x! ( »| has two identical columns, and hence vanishes. Consequently, the 
terms in the above summation are only non-zero when fis bijective, i.e., when it is a 
m1 v 
OD) gin) 


permutation. But in that case, by Theorem 7, we have lz 


=ltml=e. oO 


Remark. Of course, Theorem 8 can be proved directly by induction on n DUE 


181 


the appearance of the sign ce becomes easier to understand when one adopts the point of 
view of group theory. Theorem 7 is also of independent interest, in addition to its use in 
proving Theorem 8. 

Theorem 8 can be taken as the starting point for the theory of determinants (as is 
often done). Namely, after defining det A by (2), we would then prove all of the 
properties, including the formula for expanding det A along the elements of the first 


(or j-th) column, which was our point of departure in Chapter 3. 
EXERCISES 
1. Prove that, up to isomorphism, there are only a finite number p(n) of groups 


of given order n. 


2. Using Exercise 7 of §2, show that every finite group can be imbedded in a 


finite group with two generators (i.e., there exists a monomorphism into such a group). 
3. Prove that a subgroup of index 2 must be normal. 


4. Using Exercise 3 , try to prove that, up to isomorphism, S3 is the only non- 


abelian group of order 6. 


5. Try to show that all of the subgroups of the alternating group Ay are depicted 
in Fig. 14. Here we let Vv, denote the so-called Klein four-group: 


v= {e, (12)(34), (13)(24), (14)(23)} ; 


(123) 
(14) (23) 


Fig. 14 


182 


the other vertices correspond to cyclic groups whose generators are given in the diagram. 


6. Show that any group of order 4 is abelian, and is isomorphic to one of the 
permutation groups: U = ((1234)) or the Klein four-group Me . Alternately, it is 


isomorphic to one of the matrix groups 


1 0 0 1 = I 0 QO -1 
L, = , ; ; S GUA, Im) » 
(iil IL 0 @ oil 1 0 
1 0 1 0 al 0 eal 0 
Ly = ’ ’ ’ © GL(2, R) 
O 1 O -1} 0 1} 0) ill 


Write out explicitly isomorphisms U 7 L, P vy = L, : 


84. Rings and fields 


i. ‘The definition and general properties of rings. The algebraic structures 


(Z,+) and (Z,°*) were our first examples of monoids, andthen (Z,+) turned out to 
be an additive abelian group (actually, a cyclic group). But usually one thinks of these two 
structures together, as a single structure known as a ring. There are important relations 

of arithmetic which involve combining the additive and multiplicative structures of Z 5 wre 
most basic of which is the distributive law (a+b)c = ac + bc. This law seems trivial to 
us only because we are so accustomed to using it. If we try, for example, to combine the 
algebraic structures (Z,+) and (Z »°) where nom =n+m+nm ; we find that it is 
not so easy to find a basic relationship between the two binary operations. Before proceeding 


to more examples, we give the precise definition of a ring. 


Definition, Let R bea non-empty set with two binary operations + (addition) 
and ¢ (multiplication) which satisfy the following conditions: 
(Ri) (R,+) is an abelian group; 


(R2) (R,°*) is a semi-group; 


183 


(R3) The operations of addition and multiplication are related by the left and right 


distributive laws (in other words: multiplication is distributive with respect to addition): 


(arise = aes ine | c(a +b) = ca+cb 


forall a,b,ceR. 

Then (R,+, °*) is calleda ring. 

Wecall (R,+) the additive group of the ring, and we call (R, *) its multi- 
plicative semi-group. If (R,+*) isa monoid, then we say that (R,+,+) is a ring with 


unit. 


It is customary to let 1 denote the unit element of a ring with unit. Sometimes, 
the existence of a unit element is stipulated in the definition of a ring, but we shall not do 
this, and shall maintain the distinction between a ring and a ring with unit. 

Sometimes in applications and in general ring theory (which now exists in a very 
developed form), one considers algebraic systems in which the axiom (R2) is either 
completely dispensed with or else replaced with a weaker axiom, depending on the concrete 
problem. In such cases one speaks of non-associative rings. But here we shall only deal 
with the usual (associative) rings. Thus, we shall freely use Theorem 1 of $1, which 
allows us to disregard parentheses in a product ey eee of anynumber k of 


elements of a ring. 


A subset L of aring R_ is called a subring if 
Xany, Cla ciyee ES and sexy te W258, 


i.e., if L is a subgroup of the additive group and a subsemigroup of the multiplicative 
semigroup of the ring. 

It is clear that the intersection of any family of subrings of R_ is a subring (the 
argument is the same as in the case of groups), and so it makes sense to speak of the 


subring (T) ¢ R_ generated by the subset TC K. By definition, (T) is the inter- 


section of all subrings of R whichcontain T. If T itself is a subring, then 


184 


Cry) = WT. 

A ring is said to be commutative if xy = yx forall x,y e« K (unlike for groups, 
the word "abelian" is not usually used for commutative rings!). 

This notion of a ring is very broad. Even the class of commutative rings, which at 
first glance seems to be rather specialized, has been the object of intensive study for many 
decades, and the theory of commutative rings is now interwoven with algebraic geometry -- 


a beautiful mathematical discipline which overlaps with algebra, geometry and topology. 


Examples. 1) (Z,+,+), the ring of integers with the usual operations of 
addition and multiplication. The set mZ of integers divisible by m_ is a subring of 
Z (which does not have a unit element if m> 1). Similarly, @ and MR are rings 


with unit, and the natural inclusions ZC @cCR_ give a chain of subrings of the ring R. 


2) The properties of addition and multiplication in M8) , which were introduced 
and studied in detail in Chapter 2, show that M(B) is a ring with unit element 1=E. 
It is called the full matrix ring over IR, or else the ring of nxn matrices over R. 
This is one of the most important examples of a ring. Since matrices do not generally 
commute with one another if n>1, M(B) is a non-commutative ring for n>1., 

The rings M(Q) and M (4) of nxn matrices with entries in Q and Z ; 
respectively, are contained as subrings in M_(R) . Actually, MOR) is full of a wide 
variety of subrings. Many of them will arise in various contexts later in the book. We 
further note that it is possible to consider the ring M,(R) of nxn _ matrices with entries 
in any commutative ring R, since the sum or product of two matrices in M_fR) will 
again have entries in R, and the distributive laws for M_(R) follow from the distributive 
laws for R. All of this is a direct consequence of the rules for matrix operations, which 


were summarized at the end of subsections 1 and 3 of §3 Ch. 2. 


3) In many areas of mathematics the concept of a ring of functions plays a vital 
role. Let X be any set, andlet R beany ring. Let oe {X ~ R} denote the set 


of all functions (i.e., all maps) f:X-R , considered along with the two binary operations 


185 


pointwise sum f+g and pointwise product fg » Which are defined as follows: 


(£ + g) (x) = f(x) ® g(x) , 


tl 


(fg) (x) = f(x) © g(x) 


(@ and © are the addition and multiplication operations in R). Here multiplication is 
obviously not the composition of functions which in the case of linear maps led us to the 
ring M . Rather, pointwise multiplication reflects the point of view in calculus, where 
X= R,R=R, and, for example, the product of tan and sin is 
tan:sin: x  tanx sinx, and not tane sin: x  tan(sin x). 
: Xx are ; : 
It is easy to check that R satisfies all of the ring axioms. For example, the 


distributive law in R_ gives 
[1(x) © g(x)] © W(x) = f(x) © h(x) @ g(x) © A(x) 


for any three functions f,g,h ¢€ and any x e€ Xj; by the definition of the operations 
X 

in R , thisgives (f+g)h = fh+gh. If 0 and 1 denote the zero element and the 

unit element in R, then the zero element and the unit element in iz are the constant 


functions 


If R is commutative, thensois R 

The ring ae contains many subrings, which can be defined by various special 
properties of functions. For example, let X =[0,1] be the closed intervalin R, 
and let R = R. ‘Then the ring Rl 1 of all real-valued functions defined on [0,1] 


Pon ; : [0, 1) 
bd of all bounded functions, the subring eee 


contains the subring KR of all 


continuous functions, the ring ROM of all differentiable functions, and so on, since all 
i 
of these properties are preserved under addition (and subtraction) and multiplication of 


functions. 


To every element ae¢R_ there corresponds the constant function ay defined by 


186 


x 
a(x) =a forall x¢€X, and the imbedding which takes each aeR to ay eR 


x 

allows us to consider R asasubringof R . 

4) Every additive abelian group (A,+) has the structure of a ring with zero 
multiplication ifwe set xy = 0 forall x,yeA. 

Many of the properties of rings are simply reformulations of properties of groups 

5 ae : min m+n 

or, more generally, of sets with an associative operation. For example, a@ a =a 
and oy. = pe for all non-negative integers m and n andforall aeR 
(compare with relation (2) in §1). Other properties, which are more specific properties 
of rings resulting from the ring axioms, are suggested by the properties of Z. We note 


a few of them. First of all, 
ac = Morn = 0) for all BIR (1) 
Namely, a+0=a = a(a+0)=aa = A726 of = e = Been oe ea >» ar0=0 
and similarly O-a = 0). 
Now suppose fora moment that 0 = 1. We obtain: a=a+l=a+0=0 forall 


i.e., R only contains the element 0. Thus, 0 #1 ina non-trivial ring R. 


Next, we have 


ipa) a ae) ae (2) 


since, for example, (1) and the distributive laws imply 
0 = a+0 = a(b-b) = ab + a(-b) ==> el 1s) = c(Gis) (3) 


Since -(-a) =a, it follows from (2) that (Fa(sis) = aly Cin particular, 
(-1)(-1) = 1) and -a=(-l)ea. 


The distributive laws imply the following general distributive law: 


n m 
a me aie o 


(@pter tay) +e +b ) 


which one easily derives by induction, first (with m = 1) using induction on n » and 


187 


then using induction on m. Now using (1) , (2) and (3), we obtain 
n(ab) = (na)b = a (nb) 


forall ne Z% andall a,beR. 


Finally, we note the binomial formula of Newton 
n a [als , dh hele! 
(a + b) CS? ; (5) 


which holds forall a,b eR, butonlyif R is acommutative ring. To prove (5), one 


uses (4) and proceeds just as in §7 Ch. 1, where we considered the special case R=Z. 


2. Congruences. The ring of residue classes. According to Theorem 6 of §2, 


the only non-zero subgroups of the group (Z,+) arethe groups mZ, where m_ runs 
through the set WN _ of naturalnumbers. But the set mZ_ is obviously closed under 
multiplication as well as addition, and all of the ring axioms are satisfied. Thus, we have 
the following assertion: every non-zero subring of “@% hastheform mZ, where 
meN. 

We now try to use the subring mZC Z to construct a non-zero ring having only 


finitely many elements. To do this we introduce the 


Definition. Two integers n and n' are saidto be congruent modm (in words: 


modulo m) if they have the same remainder when divided by m, ie., if n-n' is 


divisible by m. In that case we write n =n' (modm) orsimply n= n' (m). 


In this way Z is partitioned into equivalence classes of numbers congruent to one 


another modm; these classes are called residue classes mod m. Each residue class 


has the form 


{r} =rt+mZz = {r+mk/(ke Z} , 


so that we may write 


LZ 0) ee im , (6) 


188 


Note that the residue classes are the cosets of the subgroup mZ_ in the additive 
group 2, andthe partition (6) is the one given in Theorem4 of 82. 
By definition, n =n’ (m) # n-n’' is divisible by m. But the notation 


n=n'(m) is more convenient than m|(n-n') , because it is possible to operate with 


Ii 


congruences in the same way as with equations. Namely, if k= k'(m) and &£ = £' (m) ; 


then ki4=k' +2’ (m) and k&=ké'(m). In particular, k=k'(m) = ks=k's (m) 


forany se ZZ. 

Thus, given two residue classes {Ky and {2 is , We can define their sum, 
difference or product independently of the choice of coset representative. In other words, 
we have operations © (addition) and © (multiplication) defined on the set Zz. = Z/mz. 


of residue classes mod m: 


fk}, @ {4h = {k+4h , 


(7) 


eT Si = 


Since these operations are defined using the usual operations on integers, it follows that 
(Z_»®,0), like Z, is a commutative ring with unit tall ae =1 m2. This tine ie 
called the ring of residue classes mod m. If one is dealing with a fixed m, one often 


so that 


writes k instead of ik}? 


hie 9 = je 20) 


ko 


i 
= 
mm 


It is often especially convenient to forget about bars and curly brackets entirely, and simply 
choose the so-called reduced system of residues modm, namely, the set 
{0,1,..., m-1} and work with this fixed set of representatives. For example, if we 
use this convention, we have: -k=m-k » A~Am-1l)=-2=m-2, 

Thus, there is such a thing as finite rings. Here are three examples of the rings 


aan » With the addition and multiplication tables given separately: 


189 


+{0 1 +/0 1 
Z,: 0/0 1 0j|0 0 Z 
ipa 0) aL HO ak 


The residue rings Zn have been of interest to number theorists for a long time, 


and in algebra they served as a point of departure for various important generalizations. 


3. Ring homomorphisms and ideals. By (7), the map f:n t+ {n}_ has the 
ma m 
following properties: f(k+2%) = f(k) @ f(4) , f(k£) = f(k) © f(4). This suggests that 
we should call f ahomomorphism of the rings Z and Zn and make the following 


general definition. 


Definition. Let (R,+,+*) and (R',®,®) betworings. A map f:R-~ R' is 


called a homomorphism if it preserves both operations, i.e., if 


It 


f(a + b) f(a) @ f(b) , 


f (ab) f(a) @ f(b) 


Of course, in this case f(0) = 0’ and se(oED) = Wolsey) Sloe fol tea 


The kernel of ahomomorphism f is the set 
Ker f = {a « R|f(a) = 0°} 


It is clear that Kerf isasubringof R. But Kerf is much more than just a subring. 
Namely, for all xe R wehave x*Kerfc Kerf (since forall ke Kerf we have 
f(xk) = £(x) © f(k) = f(x) © 0° = 0") and similarly (Kerf) »x c Kerf. Thus, if we 
denote L= Kerf, wehave RL© L and LRC L. A subring L havingthese two 
properties is called a (two-sided) ideal of the ring R. Thus, the kernel of a homomorphism 
is always an ideal. 

As in the case of groups (see the glossary in subsection 3 of $3), a homomorphism 
f:R ~ R' is called a monomorphism if Kerf = 0, an epimorphism if its image (the set 


ofall a'¢« R' oftheform f(a) for ae R) is all of R' , and an isomorphism if it is 


190 


both a monomorphism and an epimorphism. If R and R' are isomorphic, we write 
i SRY 

The map f:nb oe is clearly an epimorphism Z ~ Z_, with kernel 
Kerf = mZ. When we constructed Zon we implicitly used the fact that mZ_ is an 
idealofthe ring Z. 

It so happens that every non-zero subring in the ring Z is an ideal, but that is 


unusual, For example, in the matrix ring M,(Z) the set 
a £B 
0 6 


The example of mZ_ suggests a method for constructing ideals (not necessarily all 


a,8,6¢2Z 


is a subring but not an ideal. 
ideals) in an arbitrary commutative ring R: if a is an arbitrary fixed element of R 3 


then the set aR_ is always an ideal in R. This is because 
ax + ay = a(xt+y) , (ax)y = a(xy) 


We say that aR_ is the principal ideal generated by the element aeR. 

Note that if we insist on defining rings to be rings with unit, then ideals are not 
generally subrings. In that case, we define an ideal to be a subgroup of the additive group of 
the ring which is taken to itself under left or right multiplication by any element of the ring. 
If we are using this definition of a ring, then we also insist that f(1) = 1’ in the definition 
of a homomorphism. Of course, even with our broader definition of a homomorphism, this 
condition f£(1) = 1’ holds whenever f isan epimorphism. 

Isomorphic rings are identical in their algebraic properties, and only properties 
which are preserved under isomorphisms are of real mathematical interest. It is for this 
reason, for example, that we allow ourselves to think of the ring Zon either as a set of 
residue classes or as an arbitrary chosen set of representative numbers from the residue 


classes. 


191 


4, The concept of quotient group and quotient ring. Normal subgroups of a group 
and ideals in a ring have a common origin -- they are kernels of homomorphisms. This 
common element is reflected in the notion of forming a quotient, which we shall briefly 
discuss. We shall return to this theme in the second part of the book. 

Let us start with groups. The equivalence relation ~ ona group G_ which is 
obtained from the partition of G into cosets of a normal subgroup H_ has a remarkable 
property. Namely, if a and b arearbitrary elementsof G, andif a~c and 
b ~ d, then, by definition (see the proof of Theorem 4 in §3), we have nae = h eH, 

il 


b d=h,<H, and so 


eal al 1 


z She 2 I 
Cin des Eee al cdeec oe aa ve) deee h,b(b “d) = nih, ¢ H 


lee?) 
which means that ab ~ cd. Here we used the fact that H isnormalin G: 
-1 


b Hy b= ae ns Thus, 


ame, b =~ di o=— ab ~ cd 


This says that the multiplicative operation on G induces a multiplication of the quotient 
set G/~ (see subsection 3 of $6 Ch. 1), which we agreed to denote G/H. 
It makes sense to speak of the product of any two subsets A and B ofagroup G, 
where we mean the set AB ofall products ab with ae A and beB. Because G 
is associative, we have 


CAB) G = (aber = {a(bc)} = A(BC) 


2 =1 -1 
A subset HC G isa subgroup if and only if H’ =H and H = {h Inc H} CH. 


From this point of view, the coset aH_ is the product of the one-element set fa} 
and the subgroup H. The product of two cosets aH and bH isthe set aH + bH, 


which does not necessarily have to be a coset of H. For example, the partition of S3 


into cosets of H = fe, (12)}, which we considered in subsection 4 of §3, shows that 


H + (13)H = (13)H U (23)H 


192 


However, if H isa normal subgroup of G, then the product of two cosets aH , DH 
will turn out to be a coset of H. Namely, since gH = Hg forall geG, it follows 


that 
2 
aH * bH = a(Hb)H = a(bH)H = abH’ = abH ; 


and the same reasoning as used above shows that the coset abH does not depend on the 


representatives a and b ofthe cosets aH and bH. The properties 


aH *bH = abH , 
H*aH = aH*H = aH ' 


Eo Eee eee eee eee 


" 


show that we have the following 


THEOREM 1. if H is anormal subgroup of G , then the operation 


aH *bH = abH gives the quotient set G/H the structure of a group, which is called the 
all 


quotient group of G by H. Thecoset H isthe identity element in G/H, and a H 


isthe inverse of aH. 7 


If G_ isa finite group, then the order of the quotient group G/H is given by the 


Ic/H| = is = (G:H) , 


which should come as no surprise in view of Lagrange's theorem (subsection 4 of $3). 


formula 


If G_ is an abelian group whose binary operation is written additively, then the 


operation induced on the quotient group G/H is written 
(a + H) + (b+ H) = (a+b) + H ‘ 


In this case, G/H_ is often called the group G modulo H. If we apply this to the pair 
G=Z and H=mZ, wealsouse the expression "the group Z% modulo m". 


We now proceed to the idea of a quotient ring R/L, where R isa ring and L 


193 


is an ideal. We base the construction on the additive group of the ring R. Thus, the 
elements of R/L are the cosets a+L, which we call the residue classes of R modulo 


the ideal L. They are added by the usual rule: 


(eh a) Le 


(8) 
= (a+ L) = -atlL : 
We take the product of these residue classes to be 
(a + L) © (b+ L) = ab+L : (9) 


We have to be careful that this multiplication is correctly defined, i.e. , that it does not 
depend on the choice of coset representatives. Suppose that a’ =a+x,b'=bt+y, 


where x,y ¢€ L. Then 
a'b' = ab +aytxbtxy = ab+z , 


where z=ay+xb+xy is an element of L, because L is a two-sided ideal. Hence, 
a'b' is inthe same coset as ab, and this means that the product (9) is correctly 


defined. For brevity we write a=a+L, sothat 


a@b=a+b , 4b = ab 
In particular, O0=L and 1=1+L (if R hasa 1). We should also check that 
the set R = R/L = fa | a eR} withthe operations © and © satisfy all the ring 


axioms, but this is fairly obvious, because the operations on R_ are defined in terms of 


the operations on elements of R. For example, the distributive law is verified as follows: 
(a@byoc = (a+b)oc = (a + bc = ac + be = ac @be = azoc@boc 
All of this shows that the map 
mia >a 


is an epimorphism of rings R R' with kernel Ker # = L. Starting with the special 


case of the quotient ring Zo = Z/m@ andthe epimorphism Z~ Zn , we have found 


194 


an analogous situation occurring in arbitrary rings. 

It is worth mentioning, although it goes beyond our immediate goal (which is to 
explain the construction of Za from a general algebraic point of view), that the quotient 
rings of R_ by its ideals essentially exhaust all possible images of R under homo- 
morphisms. Namely, if f:R- R' isahomomorphism and f(R) is the image of R 
under f, then, ifweconsider f(R) ¢ R’ inplace of R' , Wwe obtain an epimorphism. 
In order to simplify notation, suppose that f was an epimorphism from the very start, 
i.e., that f(R)=R'. According to the general principle in subsection 3 of §6 Ch.1, f 
determines an equivalence relation O; on R; in our present situation, O; is the 
partition of R_ into cosets a+ Kerf = c. - The map f leads to a bijective 


correspondence f' between the elements a‘' of R' and the classes Ci namely, 


f’ 7c, a’ if a’ = f(a). Under this correspondence f' wehave 


) 


(Gc) = fC, f(a+b) = f(a)+f(b) = (Cc y+ ic) ; 


aris) 


PCC.) = £C_,) = flab) = f(a) + f(b) = es coms 


so that the bijective map f' is an isomorphism (for simplicity, we have denoted addition 


and multiplication in the rings R 7 R/ Kerf sand R' in the same way). 


We have proved the following 


THEOREM 2 (the fundamental theorem on ring homomorphisms). Any ideal L of 


aring R determines a ring structure on the quotient set R/L (using the formulas (8) 
See oe ee On the quotient set aoe ema 
and (9)), and R/L isa homomorphic image of R undera map having kernel L. 
uy gee ES Levee Ou ee ES Naa See 


Conversely, any homomorphic image R' = f£(R) of the ring R_ is isomorphic to the 
quotient ring R/Kerf. Oo 


Remark. The right side of (9) is not, in general, the same thing as the product of 
the residue classes a+L and b+L inthe set-theoretic sense. For example, if 
R=2Z and L=82Z, then 24 € 164+ 87 cannot be written in the form 


(4 + 8s) (4 + 8t) , since the latter is always divisible by 16. 


195 


5. Types of rings. Fields. In our familiar rings Z,@Q and IR, whenever 
ab = 0 we must have either a= 0 or b= 0. But the ring of square matrices M, 
does not have this property. For example, using the notation Ey (see the proof of 
Theorem 3 of §3 Ch. 2), we have Bi Es =O if j #k, although, of course, 
By # 0 and Evy # O. One might think that this is because of the unpleasant phenomenon 
of non-commutativity in M, , but that has nothing to do with it. As we saw in 
subsection 2, in the ring zy we have 262 = 0 (despite the well-known platitude 
“twice two is four"!). 


Here are two more examples. 


Example 1. Pairs of numbers (a,b) (where we maytake a,b in Z,Q or 


IR) with addition and multiplication defined by the formulas 
(a,,5)) + (a5 55) = (a, + ays by + b,) ; 
Nea aes 


obviously make up a commutative ring with unit (1,1). Here we encounter the same 
phenomenon: (1,0) + (0,1) = (0,0) = 0. 


Example 2. Inthe ring [ee of real-valued functions (see Example 3 in 
subsection 1), the functions f:x > |x| +x (whichis O when x < 0) and 


@ 3 xe tee |x| -x (whichis 0 when x > 0) have the property that their product is the 


zero function, eventhough f #0 and g #0. 


Definition. If ab =0 for a #0 and b#O inthering R, then a_ is 
called a left zero divisor and b is calleda right zero divisor (if R is commutative, 
then there is no distinction, and we speak simply of zero divisors). The zero element in R 
can be considered the trivial zero divisor. If there are no zero divisors (except 0), 


is a commutative ring with unit 


then R_ is called a ring without zero divisors. If R 


1 4 0 not having zero divisors, then R is called an integral domain. 


196 


THEOREM 3. A non-trivial commutative ring R_ with unit is an integral domain if 
and only if the law of cancellation holds: 


forall a,b,ce R. 


Proof. If R_ has the cancellation law, then whenever ab = 0 = a+0 » Wwe then 
have either a= 0 orelse a #0, in whichcase b= 0. Conversely, if R isan 


integral domain, then ab = ac,a #0 = a(lb-c)=O053 b-c=O05 b= cc. oq 


If R_ isa ring with unit, it is natural to consider the set of invertible elements: an 

element a is called invertible (or a unit, not to be confused with the alternative use of 

: ; ; oil =I oi 
this word for the element 1) if there exists an element a Sweetie ae ~ Shae “A. 
More precisely, one might want to speak of right invertible or left invertible elements (if 
ab = 1 or ba = 1 canbe solved for b, respectively), Butif R is commutative or 
is without zero divisors, then right (or left) invertibility implies invertibility. Namely, 
suppose, for example, that ab = 1 ina ring without zero divisors. Then aba = a 


, 


sothat a(ba-1)=0. Since a #0, we must have Mm - le @, ihe, be i. 
As an example, we already know that in the ring M,(R) or M(® the invertible 
elements are precisely the matrices with non-zero determinant. 


An invertible element a cannot be a zero divisor: ab = 0 = ean) =0 5 


-1 
=> (a a)b=0 > l*b=O0 5 b=0 (similarly ba = 0 = b = 0), 


THEOREM 4. The set U(R) of all invertible elements in aring R_ with unit is a 
group under multiplication, 


Proof. Since the set U(R) contains the identity element, and multiplication is 
associative in R, it remains for us to see that U(R) is closed under multiplication, i.e., 


we must check that the product ab of two invertible elements is invertible. But this is 


; A all oll 
obvious, since b a can be seen to be the inverse of ab (for example, 


(ab) (eee) = A A = pert sobae = rae = il). ol 


197 


As an example, it is easy to see that U(Z) = {+1} isa cyclic group of order 2. 

If we replace the ring axiom (R2) by the much stronger axiom (R2'): the set 
R* of non-zero elements in R forms a group under multiplication, then we obtain a very 
interesting class of rings, called division rings or skew fields. Thus, a division ring must 


be a ring without zero divisors in which every non-zero element is invertible. A 


commutative division ring -- in which multiplication has essentially all of the properties 
of addition (i.e. , the non-zero elements form an abelian group) -- is called a field. Thus, 
to repeat: 


Definition. A field K is acommutative ring with unit 1 #0 in which every 


non-zero element is invertible. The group K* = U(K) is called the multiplicative group 


of the field. 
A field is sort of a hybrid of two abelian groups --~ the additive and the 
multiplicative -- which are connected by the distributive law. A product of the form ab 


is usually written in the form of a fraction a/b. Thus, the fraction a/b, which only 
makes sense when b # 0, is the unique solution of the equation bx = a. Operations 


with fracticns are subject to the rules: 


<= 5 <=> ad = be, b,d#0 , 
s2G 0 Ss ay 
oo ee e 
2,2. a bdo, 
Goo eee 


These are the usual rules of “grammar school arithmetic", but, rather than taking them on 
faith and memorizing them, the reader should derive them from the axioms of a field; but 
this is not hard. For example, here is a derivation of the second rule in (10). Let x=a/b 


and y = c/d_ be the solutions of the equations bx = a and dy =c. It follows from 


198 


these equations that dbx = da, bdy = be = bd(x+y) = da+be = t= x+y = (da+bc)/bd 
is the unique solution of the equation bdt = da + be. 

A subfield of a field is a subring which itself is a field. For example, the field @ 
of rational numbers is a subfield of the field IR of realnumbers. If F isa subfield of a 
field K, then we also say that K is an extension field of F. It follows from the 
definition of a subfield that the zero and identity elements in K will also be contained in 
F , and will be the zero and identity elements of F. If inthe field K we take the 
intersection FY of all subfields which contain F and also contain a given element ae K 
notin F, then F , Will be the minima) field containing the set {F,a} (the reasoning 
is the same as for groups, see subsection 2 of §2). In this case we say that Fy is the 
extension of F obtained by adjoining a to F, and we write Fy = F(a). Similarly, 
we speak of the subfield Fy = F(a,, see, a.) of K _ obtained by adjoining to F the 
n elements a,,.--, aie ke 

For example, if we adjoin 2 to the subfield QC R, the resulting field 
Q(0/2) is easily seen to be the set of numbers of the form a + b/2 » 4,beQ, Since 
(4/2)° =2 and I/(a +b4/2) = ee =o) - (/G@- BE if gdb by2 # O. We have 
a similar situation for @(/3) F a5) » and so on. 

Two fields are said to be isomorphic if they are isomorphic considered as rings. By 
definition, if f is anisomorphism, then f(0) = 0 and f(1) = 1 (where we are using 
the same notation for the zero and identity elements in both rings). There is no such thing 
as a non-trivial homomorphism of fields with non-zero kernel, since Kerf #0 = f(a)=0 ; 
a #0 = f(1) = faa) = f(a) fla”) = 0+ a!) = 0 = 1b) = £1 +b) = (1) f(b) = 0 + f(b) = 0, 

Vb = Kerf=K. 

The automorphisms of a field K , i.e., the isomorphisms from K_ to itself, are 
connected with the deepest properties of fields, and are a powerful instrument for studying 
these properties. This subject is called Galois theory. 


The notion of a field extension is a reflection of the tendency of mankind, from time 


199 


immemorial, to try to increase the supply of number systems to work with. This lengthy 
historical process can be roughly summarized by the diagram: 

{one } “> {one plus one is two}4w NW? {N, 0} W? ZW? QW? a(\/2) R. 
It has continued right up to modern times, and has led to the study of a very extensive net- 
work of fields, which range far from the usual number systems of everyday use. Not all of 
the steps in constructing field extensions are purely algebraic. For example, we go from 
the rational numbers to the real numbers using the concepts of continuity and completeness 
(limits of Cauchy sequences), and this is normally done in an axiomatic calculus course, 
It turns out that there is a completely analogous construction of the socalled p-adic number 
fields, which we shall not deal with here. The resulting theory of p-adic analysis is the 


worthy offspring of three areas of mathematics -- number theory, algebra, and analysis. 


6. The characteristic of a field. In subsection 2 we constructed the finite ring 


Zn of residue classes 


eee iol = 3 


with addition and multiplication operations kee See ‘ keL=ke (we are no longer 
using the special symbols @ and ©). If m=st, s>1,t>1, then s‘t=m=0, 
i, 55 S and t are zero divisors in Zn 

Now suppose that m=p isaprime number. We claim that co is a field (having 
p elements). Inthe cases p=2,3, this is immediately clear from the multiplication 
tables in subsection 2. In the general case, it is sufficient to show that, for every Se a 


there exists an inverse element s' (where we only allow numbers s_ and s' which are 


not divisible by p). To see this, we consider the elements 
Aca (ae Me (11) 
They are all non-zero, since s # 0 (mod p) = ks # 0 (mod Dp) for ka 152, =) pial; 


this is because p isa primenumber. For the same reason, all of the elements in (11) 


are distinct: ks =28 for k < £ wouldimplythat (@-k)s = 0, which is false. Thus, 


200 


the sequence of elements (11) is the same, except for the order in which they are written, 


as the sequence 
ts an coo, pol 


In particular, there exists s', 1< s' <p-1, forwhich s's = 1. But this means 


that s'*s=1, i.e., s' is aninverse for s. We have proved 


THEOREM 5. The ring of residue classes ce is a field if and only if m is 
a prime number. oO 


COROLLARY (Fermat's Little Theorem), For any integer a not divisible by the 


prime p, 


a = 1 (mod p) 


Proof. The multiplicative group ae has order p-1l. By Lagrange's theorem 
($3), p-1 is divisible by the order of any element in = , in particular, by the order of 
a Thus, Taber Ih GB 9 aes Oo 

An alternative proof of this corollary can be obtained by replacing s by a in (11) 
and multiplying together all of the elements in the sequence; that product must be congruent 


2 


to the product of 1,2,..., p-l. 


Even though the fields i 5 %. »4.,-.. seem so different from the familiar field 
Q, they have one basic property in common with Q: they have no smaller subfields. 


Let K bea field. As we already noticed, the intersection of any family of subfields 


of K is itself a subfield of K. 


Definition. A field which does not contain any proper subfield is called a prime field, 


THEOREM 6. Every field K_ contains precisely one prime field Ky . This prime 
field is isomorphic either to Q orto a for some prime p. 


Proof, If K contained two fields K' and K" , their intersection would be a 


201 


field (non-empty, since both K' and K" contain OQ and 1) which is distinct from 
K' and K". But this is impossible if K' and K" are prime fields. Thus, the prime 
field Ko which is the intersection of all fields contained in K, is unique. 

Since Ky contains 1, it must contain all multiplies n«*l=1+---+ 1. It 


follows from the general properties of addition and multiplication in rings (see the end of 


subsection 1) that 


gol apo eS (gap o il. (oi) Geciky = (seo le Siu e 74 : (12) 


I 
(2! 
~ 
- 
a 
ot) 


Hence, the map f ofthe ring Z to K whichis defined by f(n) 
homomorphism. Its kernelis anidealin Z: Kerf=m@Z. If m=O, then f isan 
isomorphism, in which case the fractions (s°1l)/(t*l), s,te Z, which make sense 


because K isa field, forma field K,. whichis isomorphicto Q. This field is clearly 


0 
the prime field in K. 


If, on the other hand, m->O, thenthe map f* defined by setting 
ft:k = i ee 


is obviously an imbedding Ze ~ K. By Theorem 5, this is only possible if m-=p is 


a prime number. Thus, a) is the prime subfield of K. oO 


Definition. We say thata field K has characteristic zero if its prime field Ky 
is isomorphic to @. Wesaythat K has characteristic p if Ko = Ze . We write 


char K = 0 or charK = p>0O, respectively. 


Instead of a the notation a or GF(p) (GF for Galois field) is often used 
to denote the "abstract" field of p elements. It turns out that there exists a finite field 
GF(q) with q = p- elements for any prime p and positive integer n. We shall return 
to this interesting question in Chapter 9, but for now we merely give the example of the 


field of four elements {0,1, a, B}: 


202 


GF (4): 


(At this point we are not concerned with what @ and B are.) The reader should check, 
for example, that the distributive law holds. 

A field K of characteristic zero has the property that 1 has infinite order in 
the additive groupof K. If K has characteristic p, then any non-zero element has 


order exactly p_ in the additive group: 


Be Kee (lek) ee eee xe (ee i ore) xe 


7. A remark on linear systems. The time has come to cast a thoughtful glance at 
the theory of systems of linear equations and determinants developed in the earlier chapters. 
In those chapters the coefficients in the equations and the corresponding matrix entries were 
numbers (rational or real), but the exact nature of rational and real numbers was never used. 
There is nothing to stop us now from allowing the coefficients and matrix entries to be 
elements of any given field K. Of course, then the results must be formulated in terms 
of the field K: the components of a solution to the linear system and the values of the det 
function will lie in K. Gauss’ method for solving systems of linear equations, the theory 


of determinants, Cramer's rule, and so on, all remain valid for an arbitrary field K. 


Example 1. Suppose we are given a homogeneous system of linear equations 


AX = 0 with square matrix 


and column of unknowns X = [x, Xo yXqyXq ]. A direct computation shows that 


3 a 
det A = 2 +11 . Hence, if we consider Aa x, to be in K, where K_ is any field of 


203 


characteristic zero or of characteristic p # 2,11 (i.e., the integers 
1,2,3,4,-10,-++-, 15 are replaced by their residue classes in characteristic p), then 
the system is determined, and so only has the trivial solution X = 0. 


If char K = 2 (for example, if K = Zo); then we conclude from the congruence 


it 


(mod 2) 


_ 

bo 

t 

Ke) 

— 

i 

ee 

an 
Sroruert— 
eee © 
oo0Oo rf 
ll ll =) 


that the rank of the system is equal to two, and so the system has two independent solutions 


3 


tl 


(1,0,1,0] and X, = [0,1,0,1]. To avoid confusion we should write 


xX, = ia 0, 0 | ; X, = [O, 1, 0,1) , but we now have had enough experience to adopt the 


simpler notation, ignoring the bars denoting residue class. 


If char K = 11, then it follows from the congruence 


’ 


(mod 11) 


ea 

bo 

t 

Ke) 

—_ 

us 

~ 

wan 

il 
ee ee 
bh hw MH WY 
w WwW Ww ow 
ep wR 


that the system has three independent solutions 
x = (2, 1,0, 0). X, = i, 0, 1,01, X, = (17,0, 0, 1 


As we see, the answer to our questions about the system depends on our field K, although 
the procedure is the same in all cases. Thus, one of the advantages of generalizing beyond 
IR and @ to an arbitrary field is the avoidance of duplication of the same arguments. 

But there are even more important advantages. 

When we spoke about the general linear group, we have so far meant the group of all 
non-singular matrices with coefficients in @ or KR. We now let M(®) denote the 
ring of nXn matrices with entries in an arbitrary field K, and we define the general 
linear group GL(n,K) to be the subset of all non-singular matrices A€ MK) (Giencre 


matrices with det A # 0). Ifwevary K, for example, ifwetake K = EF, , many 


204 


important groups arise in a natural way (see Ch. 7). 

Fields of the type R, Q, @(/2) » etc. are usually called number fields. The 
field Es is an example of a field which is not a number field. We do not think of F 
as a "number field", even though its elements can be identified with the elements of the set 
Hi pela eats p-1}. 

In §2 Ch. 1 we posed the problem (Problem 3) of using finite fields in coding 


theory. Here is a small example related to this theme. 


Example 2. In order to transmit the word PEACE, in principle it is sufficient to 
uselioun elementary message units Ph — (070) Be (10) A = (C1) i 
which we interpret as row-vectors in the vector space F; over the field 
F, = Zo = Oe 1} containing two elements. But suppose some static arises during the 
transmission, some interference which occasionally switchesa O and 1. This could 
cause the receiver to pick up, for example, the message APACE. According toa 
fundamental theorem of Shannon, the static can always be overcome at the cost of increasing 
the length of the elementary message units (i.e., lengthening the time of transmission). 
Suppose, for example, that we know that there can be no more than one distortion in every 
elementary message unit of length five. Then take a subset So of so-called code vectors 
in the vector space S$ = F, » Say: Sp = 1 = (0, Oo 0) en = (Py OnOeny he 
A = (0,1,1,0,1), C = (1,1,0,0,0)}. From the table 


Code Vectors 


Possible vectors obtained from 
the code vectors as a result of 


the distortions 


it is clear that the sets of distorted vectors in the different columns do not overlap, and, 


hence, accurate decoding is possible, i.e., the true message can be re-established. 


205 


We have obtained a code So which can correct one error . lf we use FE. for 


n sufficiently large, we can construct a similar code which can be used to transmit the 
Latin alphabet, and hence any text, accurately. If we want to avoid unnecessarily long and 
cumbersome decoding, we should choose So carefully. There are many techniques for 


doing this, including some purely algebraic approaches based on using the finite field IF 
EXERCISES 


1. Developing the idea in Example 2) of §1, show that the set ®(Q) with the 
operations 


AG Be (ANUEB)\ Gai By ABE Aa ieBr Ar B 6 Oar 
is a ring with unit in which all of the elements of the additive group have order two. 


sane : 2 
2, Prove that any ring in which x = x forevery element x must be 


commutative. Is this true if = = x forevery x? 
3. Are the fields @(/2) and @(/5) isomorphic? 


4. Do the non-invertible elements of the following rings form an ideal: 1) Zi 6, g 


5. Show that the image of a commutative ring under an epimorphism is a 


commutative ring. 


6. Showthat, if R isa ring with unitand L_ is an ideal, then the quotient ring 


R/L_ also has a unit element. 
7. Show that any finite integral domain is a field. 


8. Let p bea prime number, andlet R_ be a commutative ring with unit such 


that px = 0 forall xeR. Show that then 


m 
(x + y)P = Se +y 4 io & Wy ee on 


206 


9. Prove that a ring consisting of five elements is either a ring with zero 
multiplication or is isomorphic to ee 0 
10, The set T = {5 al la,b ¢ Z} of upper triangular matrices is a subring 


in M,(Z) . Prove this, and give a description of all of the ideals of the ring T. 


11, Anon-zero element x inaring R_ is called nilpotent if eat for some 
ne N. Show that: 

(i) if R isa ring with unit and x is nilpotent, then the element 1-x is 
invertible; 

(ii) the ring Z = Z/mZ_ contains nilpotent elements if and only if m_ is 


divisible by the square of a natural number greater than 1. 


12, Prove that, in a commutative ring with unit R_ of infinite cardinality, there 


cannot be a finite number n zl of non-invertible non-zero elements. 
13. Let R_ be any associative ring withwmit 1, andlet a,b eR. Show that 


(l-ab)c = 1 = c(l-ab) ==> (1-ba)d = 1 = d(1-ba) 


3) 


where d= 1+bca, Le., if l-ab is invertible in R, thensois 1-ba. What 


does the element 1+ adb equal? 


: b ; 
14. Show that the matrices | ee | with a,be Z, forma field of 9 elements, 


8 


and that the multiplicative group of this field is cyclic of order 8. 


15. Can the code So in Example 2 at the end of the section correct two errors? 


Chapter 5. Complex Numbers 
and Polynomials 


In this chapter we study some very concrete algebraic systems, which are somewhat 
familiar from elementary mathematics, but which deserve more detailed examination. The 
point of view developed in the preceding chapter allows us to take a fresh look at what 
was the traditional domain of algebra in earliertimes. Meanwhile, such concepts as 
ring extensions and unique factorization in integral domains become more tangible and 


understandable by looking at the example of polynomial! rings. 


$1. The field of complex numbers 


The history of mathematics saw a stubborn and protracted struggle between the 


supporters and detractors of the "imaginary" numbers that arise from the algebraic equation 
2 
~ thaw | (1) 


One could simply agree to write the solutions of (1) inthe form + v-l asa formal 
notational convention. But this is not a satisfactory answer to the problem; we would like to 


attach some meaning to the notation. We shall discuss this on several levels. First we give 


208 


some heuristic considerations. 


1. An auxiliary construction. We would like to extend the field of real numbers 
IR in such a way that equation (1) has a solution in the new field. As a model (example) 
of such a field extension, let us take the set K_ of all square matrices 


a b 
-b 


€ M, (R) 6 (2) 


We claim that K isa field (compare with Exercise 14 in §4 Ch. 4). First ofall, K 


contains the zero O and the identity E of the ring M, (IR). Next, the relations 


a a c di _ atc b+d 
-b -d ci |-(b+d) atc] ? 
a bh _ | -a -»| 
“I-b all — ie -ajp ? (ey 
a b ce ¢@il _ aces bda ci tibe 
-b a -d cll {j-(ad+bc) ac - bd] 


imply that K is closed under addition and multiplication. The associativity of these 
operations follows from the associativity of addition and multiplication in the ring M,(R) 
The same goes for the distributive laws. Thus, K_ is a subring of M,(IR) 5 I ie 
commutative by (3). It remains to show that any matrix (2) with a and b_ not both 
zero has an inverse in K. (Note that the determinant of the matrix is ae + A 2 ©). 
Either by using the formula for the entries in the inverse matrix (see Theorem 1 of §3 


Ch. 3) or by solving the linear system which comes from the condition 


a b xX YY i 
-b a Ay ae @ il y 
we find that 
oil 
él 18) a eG ae al q -b (4) 
we = Bee 2 ; ee € = 7 = . 4 
a +b sae 
Thus, K_ isa field. 


Using the rule (5) in §3 Ch. 2 for multiplying matrices by scalars, we can write 


209 


any element of K_ inthe form 


(S) 


al 


= aE +b], where a,be R,J = - OG 


The field K contains the subfield {aE|ae R}=R , and the relation 


i oA 26 


shows that “up to isomorphism" the element J eK _ isa solution of equation (1). So there 
is no need for any mysticism about J being an "imaginary quantity”. 

But it is not the field K which is called the field of complex numbers, but rather 
a certain field isomorphic to it whose elements are identified with the points ofaplane. It is 
natural to want to think of K in terms of a geometrical realization; after all, the real 
number field IR is inseparable in our minds from the number line, i.e., with a line 


having a point denoted O anda fixed scale of distance to the point 1. 


2, The complex plane, Thus, we would like to construct a field @ whose 
elements are the points of the plane R? , having addition and multiplication operations 
which satisfy the field axioms, and solving our problem of being able to solve equation (1). 
On the cartesian plane, take the usual rectangular coordinate system with x- and y-axes. 
We write (a,b) for the point with x-coordinate a and y-coordinate b. We define the 


sum and product of two points (a,b) and (c,d) according to the rules 


(a,b) + (c,d) (a+c, b+d) , 


(6) 
(ac - bd , ad + bc) 


(a,b) (c,d) 


(we use the same symbols + and *« asinthe field IR, but this should not cause 
confusion), We could verify by a straightforward but tedious series of calculations that these 
operations satisfy all of the field axioms, i.e., the operations (6) make the set of pairs 
(points in the plane) into a field. But happily, there is no need for this verification, because 


if we let each point of the plane @ correspond to an element of our earlier field K as 


follows 


210 


a b 
-b a 


(a,b) /> 


by comparing (3) and (6) we see that the set @ isa field isomorphicto K. C is 
called the field of complex numbers. Because of the geometrical realization of C€ it is 
also often called the complex plane. 

The points on the x-axis, i.e., the set of points of the form (a,0) has the same 
properties as the real number line, so we set (a,0) = a. Then the zero (0,0) and the 
identity (1,0) remain the same as on the real number line. ‘The point (0,1) onthe 
y-axis is traditionally denoted i (the "imaginary" unit); it is a root of equation (1), 
since i = (0,1)(0,1) = (-1,0) = -1. An arbitrary complex number z = (x,y) can now 


be written in the customary form 
Be ea iy, wow ew. (7) 


which is very close to the form (5) for the elements of the field K. Note that 


Q@cC Rc C. Hence, C€ isa field of characteristic zero (see subsection 6 of §4 Ch. 4), 


3. Geometrical interpretation of operations with complex numbers. The x-axis 
in the complex plane is usually called the real axis, andthe y-axis is called the imaginary 
axis. Anumber iy onthe imaginary axis is called a purely imaginary number, although 
the word "imaginary" has lost its original meaning, which implied that such numbers are 
less legitimate than real numbers. If z isa complex “) 


number written in the form (7), x is called its 


real part and y is called its imaginary part. The 


map that associates to every complex number z = x +iy 
the complex number z = x - iy (called the complex 


conjugate of z) is called complex conjugation, 


Geometrically, it amounts to reflection of the complex 


Wie. IS 


plane about the real axis (Fig. 15). A very remarkable and important fact is 


211 


THEOREM 1. The map z b z isan automorphism of the field @ having order 


2 which keeps all real numbers fixed. The sum and the product of a number with its 


conjugate are real numbers, 


Proof. It follows immediately from the definition of complex conjugation that x=x 
for xeéWR. In particular, 0=0 and 1=1. The claim that complex conjugation has 
order 2 (i.e., performing it twice gives the identity map) is also obvious: z=2z. We 


must now verify that 


ae eg Se Bl Zan = 4 “ee (8) 


but this follows immediately from the formulas (6), which should be rewritten in the form 


i 


(x, + iy,) + (x, + iy) (x, + x5) ap ity, ap Y,) 


(9) 
(x, + ty,) * (ty #Ayy) = (y%4 ~ 9) VQ) + 1) Vy + X¥)) 
Finally, as a special case of the formulas in (9), we find the sum and product of z=x+iy 


= - iz 
and GSE Oy Wile wap = 2 We SOS ay c ial 


Remark. Among all of the automorphisms of the field © (of which there are many), 
complex conjugation is the only continuous automorphism other than the identity (i.e. , it 
takes nearby points inthe plane @ tonearby points). We shall not give a precise definition 
or proof of this assertion. 


The modulus (or absolute value) of a complex number z = x + ly is the non- 


negative real number Iz | =VZZ = lx ar = . The position of a point z on the plane 
is completely determined by giving its distance r = Iz | from the origin (0,0) and the 


angle « measured counterclockwise from the positive real axis to the line from (0,0) 
to z (see Fig. 15); these are the well-known polar coordinates. The angle g is 
called the argument of z andis denoted argz. Although argz can take any positive 
or negative value, for given r_ the angles which differ by integer multiples of 27 


correspond to one and the same point. The argument of 0 is not defined, but its modulus 


212 


is lo = 0. Note that the relationship “greater than” or "less than"' makes no sense 
for complex numbers, i.e., they cannot be related to one another by an inequality sign: 
unlike in the case of real numbers, whose arguments can only take the value O (for 
positive numbers) or # (for negative numbers), complex numbers are not ordered. 
The polar coordinates r and o determine x and y by the well-known 


formulas 
xX = reos®, y =rsing, z = r(cos@ + isin) A (10) 


This is the so-called trigonometric form for z. 

The operation of adding two complex numbers z and z' is expressed easily in 
cartesian coordinates, namely, by the parallelogram rule, which tells us how to add 
directed line segments (vectors) from the origin (see Fig. 16). We obtain an important 
inequality if we look at this picture and compare the sides of the triangle with vertices 0 q oe 


and z+z' (and note that the lengths of the sides are given 


by absolute values of the corresponding complex numbers): 
| 
+ 1 
lz + z' | < lz] + cae (11) Pe 


Note that the inequality (11), which can also be written 


in more general form 


Jzl-le'| < jeez} < lel+ |e] , 


is completely analogous to a similar inequality for real 


numbers. 


The operation of multiplying two complex numbers can be 


conveniently expressed in polar coordinates, 


THEOREM 2. The modulus of the product of complex numbers is equal to the 
— Seren hurmbeers 18 Equal to the 


roduct of the moduli, and the argument of the product is equal to the sum of the ar ments: 


lzz' | = Iz | : ran argzz' = argz + arg z' F (12) 


213 


Similarly, 
|z/z'| = |z| / lz" | > Bhgsetahrg = Big yA Biker oa 
Proof. Suppose that the triginometric form (10) for z and z are 
z= r(cos~ + ising), z' = r'(cos@' + ising’) 
Using (9) or simply multiplying directly, we obtain 
zz' = rr'[(cospcosg' - sing sing’) + i(cose sing’ + sing cose')] , 


and, using the well-known formulas from trigonometry, we can re-write this as the 


trigonometric form for 2z': 


ZA lz | . lz" | : [cos@ + ') + isin + ')] 
Next, if z" = z/z', then, z = z'z", and, using (12) for the product Zinzer 
we obtain the desired formula for the fraction 2/z'. fl 


In particular, fa = ees [cos(-@) + isin(-)]. 


edt 
To find z in the complex plane (see Fig. 17), first find 
the point z' which is obtainedfrom z by "inversion" 


' 


with respect to the unit circle, and then reflect z about 
the real axis. (’Inversion" means that the distance from 


z' to O isthe reciprocal of the distance from z to 0.) 


Actually, the assertions about the modulus of a 


product and the modulus of a sum can also be easily derived Pig l7 


from Theorem 1. Inthe first place, we have 


laelr = TRTUMERZ = Gdn GEG. = ZZ * cle = lle Ete ; 


: |z' | . Next, since lz | = hes ee Ae = |x | , we obtain 


and hence lzz'| = lz 


eal = (14 z)(L+2z) = 14 (242) + 22 = 1+ 2x+ lz|* < Wee ele tz = (1+ lz |) 


214 


and hence lhez| < 1+ |{z|. Now if z#0 and z' #0, it follows that 
be ele et 425 z')| = Iz| ° ere rales 


< lz] + Q4+|z%2)) = lela + lz|* 2") = lel + [2'| 


The above results lead us to the general conclusion: the usual form (7) for complex 
numbers is convenient for expressing additive properties, and the trigonometric form (10) 
is best suited for multiplicative properties. If we violate this principle, we end up with 


very complicated formulas which obscure what's going on. 


4. Raising to powers and extracting roots. The formula (12) for multiplying 


complex numbers given in trigonometric form implies the following de Moivre's formula: 
[r(cos@ + ising) ]” = r' (cos no +isinng) , (13) 


which holds for all ne Z (another way of writing it: Iz" = lz ie » arg zens arg z). 
If we use the special case of (13) when r= 1 , together with the binomial formula (1) 
of §7 Ch. 1 and the relations 


2 3 ; ra Ak+h meu 
Pew, feel, dt =f a =i 


we obtain formulas for the cosines and sines of multiples of an angle: 


k = 
cos ng = (ip) fa cos" er 0 sin? wp , 
k>0 


(14) 


It 


; ino peek 
sin ne SS (- 1) {oie a) 698 ep * Sin v6) : 


k>0 
a : on d : 
Remark. Let e = lim (1 + —) . Ina first course on functions of a complex 
N+ : 
; Tom ; a 
variable it is proved (by power series expansion) that e for complex @ can be evaluated 


using Euler's formula 


i 
oe = cosm + ising , (15) 


215 


which, in turn, can be used to derive all of the results above. One need merely note that 


el? iP _ eee ene = eine 


Thus, the trigonometric form for a complex number Zz amounts to writing: 
Z= lz | 2 el? 3 

We would next like to learn how to extract roots of complex numbers, and the first 
question that arises is whether or not it is always possible to extract an arbitrary root of 
any complex number. It turns out that it is always possible, and de Moivre's formula 
essentially gives a complete answer to the problem. Suppose we are given a complex 
number z = r(cos@ + ising), and we would like to find a number z' = r'(cosq'+i sing’) 


such that (z')" = z. Using de Moivre's formula to express (z'y and then equating the 


? 


moduli and arguments in both sides of the equation (zy = z, we find that (r')" S ie 


and ng’ =o + 27k (where we have to add the term 2%k because ¢ is only determined 


up toa multiple of 2). ‘Thus, 


27k 
ie fee 


Fan n 
(by fr we mean the positive root of the positive real number r). Thus, the root Vz 
exists but is not uniquely determined. For each k = Ope, w= 1) we obtainia 
different value for z'; but these n values are all possible roots, since if we write 


k=nqt+r, 0< r<n-1, then we have 


og = SEATE + omg 


We have proved 


THEOREM 3. It is always possible to extract an n-th root of a complex number 


Z= lz | (cos + ising). There are n n-th roots of z, which are located at the 


vertices of a regular n-gon_inscribed in the circle centered at the origin with radius 
OES SS eS gon__ inscribed in the circle centered at Me ons 
n 

Viz: 


216 


27k 


ce sl veneeng acl ay fe) 


COROLLARY. The n-th rootsof 1 are given by the formula 


+isn——, k= 0,1,---,n-1 . (17) 


They are located at the vertices of a regular n-gon inscribed in the unit circle. 0 


It is immediately clear from (16) and (17) that there will be either zero, one, or 
n 
two real roots es and either one or two real roots wT : 
An n-th rootof 1 is called primitive if it is nota root of 1 of lower degree. 


For example, the following are primitive n-th roots of 1: 


20 hte Cee 
Ge G8. = Cosy —— ap ile == sel 
i n n niall 
: Sener k ; 
Any n-th root of 1 isa power of the primitive one: &. = £,; » as we can again see 


from de Moivre's formula. Moreover, Sa, = hanya if we consider k+4£ modulo n. 
In particular, e. = Aen and Eq = 1. Having studied group theory, we now conclude 
that the n-th roots of 1 forma cyclic group (€) of order n. 

This gives us another model for the cyclic group of order n. By Theorem 6 of 
He Ga, 4 , the subgroups of this cyclic group are in one-to-one correspondence with the 


positive divisors d of n. Foreach d In there is exactly one subgroup of (¢«) 


having order d, and this is the subgroup Genes - A root 7 is primitive if and 


only if coe =(e), ie. , if and only if Card ¢ et) =n, and this holds if and only 
if m and n are relatively prime. For example, if n= 12 , then the primitive roots 


> ff ll 
»€ ,€ , and €  . if n=p isa prime, thenall n-th roots of 1 except for 


are € 
1 are primitive. From the algebraic point of view (disregarding their location on the 


complex plane), all of the primitive roots of a given degree n are equivalent to one 


another, 


217 


Returning now to the problem of extracting the n-th roots of an arbitrary non- 


zero complex number z, we note that, if z' is any fixed root (for example, 


t n Soe 
vA) NT lve (cos 2 + isin ) , then all of the other roots have the form z'’ ae 


k=0,1,---,n-1. This agrees with formula (16). 


5. Uniqueness theorem. We are not yet ready to appreciate fully the advantage @ 
has over IR, but already the fact that @ contains all rootsof 1 justifies our taking a 
special interest in the complex numbers. A natural question which arises is whether there 
are other fields having the same properties as (@. It turns out that we have the following 


uniqueness theorem for complex numbers. 


THEOREM 4. Let F bea field isomorphicto IR (for example, F = R), and 
let K_ be an extension obtained from F by adjoining a root j ofthe equation 


2 
x +1+2=0. Then K isisomorphicto C. 


Proof. By the definition in subsection 5 of §4 Ch.4, K = F(j) is the minimal 
subfield of some field L containing F and j. Since we were given the field L, we 
can consider elements of the form a+jb, a,beF, where the sum and product are 
taken in the sense of the operations in L. Distinct pairs a, be F correspond to distinct 
elements a+jb, since otherwise we would have an element a'+jb' = 0 with a' # 0 
or b' #0. If b' =0, thenclearly a'=0. Butif b' # 0, then we would obtain 
j = -a'/b' e« F, which is absurd, since F =R and R _ does not contain a solution to 
the equation of +12=0; thus, j d F. Next, we use the equality Fr = ~-1 and the 
operations in L to compute 


(a, + jby) + (a, + jp) = ta, al) ae aC ae By 
(18) 


(a = b,b,) + j(a,b, + ayb,) 


yee) eat 15) 1 


In addition, 


218 


ss -b ; 2, 2 
Go) Se ee if ae Sb: One 
Ie ae 
a +b a +b 


This shows that the set {a + jbla ,»be F} in K_ is closed under all of the operations in 


L and so forms a subfield. Since K is the minimal such field, we must have 
K = {a + jbla,b ¢ F} 


In addition, the formulas (18) coincide with (9). 


If f:F ~ R_ is our given isomorphism, then the map 
BA ise eT ED La 00) 


which takes each element of K_ to the point of the complex plane (€ with coordinates 


(f(a), f(b)) is an isomorphism between K and ¢€ , because of the above formulas. fa 


In Subsection 1 we constructed such a field K in M,(R). But there are many 
2 
constructions for a field obtained by solving the equation x°+1=0 -- we shall give one 
more such construction in the next section. By Theorem 4 , all such fields are isomorphic. 


, ; ~~ 
Note that in the statement of the theorem we really should have written x° +1 = 0 , where 


Tl and 0 are the identity and zero elements in F. For example, in the case of 
2 ~ ~ ~ ~ 
Ke M,(R) we have J +1 = 0, where 1=E and O isthe zero matrix. 


There are many other subfields of € besides @ and R. Especially interesting 


examples are the extensions of Q obtained by adjoining some element of @ not in Q. 


Example _1 (quadratic fields), Let d bea hon-Zero integer (positive or negative) such 
that Vdd @. The field Qfd)C C iscatleda real quadratic field if d > 0 andan 
imaginary quadratic field if d <0. The field Q(/2) was discussed briefly in 84 of 


Chapter 4. If we use the same argument as in the proof of Theorem 4 , With j replaced by 


‘ 2 
fd and the relation J = 7-1 replaced by (ene =d, we find that 


Q(¥d) = {a+bVd]a,be g} : 


In particular, in place of (18) we have 


219 


AL “Gp lo) fa = 
Ce Gy ee) hb) va, 
(19) 

fi oP lo} fd la = 

(a, 1 V9) (a, +b, Vd) = (a,a, + by bod) + (ab, + a,b,)Vd 
In addition, 

-l rs 
G@ a bv) 2 eee 


ii avbv¥d?0 (le, if 2 and b are not both zero). 


Using (19), we easily verify that the map 
fiatbvd ae -pbvd 


is an automorphism of the field Q(/d) (the analog of complex conjugation), By the norm 
of anumber @= a+b Ja we megan the real number 


N(@) = ae 5 ake = af (@) 


It is obvious that N(@) = 0 # @=0. Furthermore, since f isan automorphism, we 


have 
N (@B) = @Bf (@B) = aBE(a)f(B) = af (a) + BEB) = N (a) *>N(B) 


. =all Il 
In particular, N(@)> N(q@ ) = N(a@ _) = N(1) = 1. Hence, the norm has basically the 


same properties as the square of the modulus in C. 


Example 2 (Constructive number fields). We suppose the points (0,0) and 


(1,0) to be given to us on the cartesian plane R?. All subsequent constructions 


must be realized using only a compass and straight-edge. Once we construct two 


points P and Q, we naturally consider the segment PQ also to have been 


constructed. If we have constructed a point P and a segment Tr, then we can 


also construct the circle with center P and radius r. Points of intersection 


of any two lines or circles which have been constructed will similarly be 


considered to be constructible. 


A complex number atib ¢ & is called constructive if it is possible to 


construct the point P=(a,b) in finitely many steps (as described in the last 


paragraph), starting from (0,0) and (1,0). It is not hard to see that atib 


is constructive if and only if both Jal and lp | are constructive. We let 


220 


CS denote the set of points in the plane which can be constructed in this way 


by straight-edge and compass, i.e., the set of all constructive complex numbers. 
Theorem 5. The set CS is a subfield of UC. 


Proof. It follows immediately from the definition of constructivity that 
CS is closed under addition and taking negatives (going from z= a+ ib to 
-z = -a — ib). 

Now suppose we have con- 
structed line segments of length 
a and §. The diagrams to the 


right show how, by constructing 


similar triangles (the dotted 


1 d=a/8 


lines), one can construct the 

product y = 48 and the ratio 6 = a/8. But, in the final analysis, construction 
of zz'= (atib)(a'+ib') = (aa'-bb') + i(ab'+a'b) and 1/z= af (az +b2) - 

- ib/(a? +b?) reduces to construction of segments of the type y and 6. Hence, 
the product and the reciprocal of constructive numbers are constructive. We have 


thereby proved that the set CS is closed under all of the field operations in €C. 


It is customary to call any subfield K € CS a constructive number field. 
Obviously, Q@COK, and K is a field of characteristic zero. 


EXERCISES 


1. Find all complex numbers z of modulus 1 for which oe +(1+i)z is 


purely imaginary. Draw the locus of points z with this property on the complex plane. 


2. What can be said about the field R(8) which is obtained from R by adjoining 
a complex number § which satisfies the equation gc = oi? 
3s 


Het A Be MR). Using Theorem 1, prove that det (A + iB) = det (A - iB) 


221 


(the bar denotes complex conjugation). 
a, ILee A513 MAR), and let 


[es 


€ M,,, OR) 
Applying elementary transformations of type I and Il over the field of complex numbers 
@ tothe real matrix C, showthat det C = |det (A + iB) |? Z 


5. (Pdlya and Szego). Using Exercises 3 and 4, explain the following "strange" 


fact. The square homogeneous linear system 


912% + rede = 0 
oN («) 
: =0 
al s uy cle 
with complex coefficients Gee = ay + ib, and unknowns Zy =X + iy, has a non- 
trivial solution (Z) pitty Zz) if and only if det (dy) =a+tiib-= 0 (see the general 


remarks on this in Subsection 7 of 84 Ch. 4). This condition leads to the two equations 


b,,. On the other hand, the 


: : 2 
a= 0 and b=0 involving 2n real numbers Ang? Pay 


system (*) can be written as a system of 2n linear homogeneous equations with 2n real 
unknowns X,,Y,- In that case the condition for there to exist a non-trivial solution is the 
vanishing of a single real 2n x 2n determinant, i.e. , a single equation involving the 


aig? Die . Explain why these two conditions for a non-trivial solution are compatible. 


6. Keeping in mind that an automorphism of the quadratic field ad) must leave 


the rational numbers fixed, find the automorphisms of this field. 


7. Find the sum of all n-th roots of 1 (n> 1). Find the sum of the primitive 


12-th roots of 1 andthe sum of the primitive 15-th roots ii ile 


8. Find and give a geometrical picture of the kernel and image of the map 


2nit 
(R,+) ~ (a*, +) (where a* = €\{0}) givenby the : (see formula (15)). 
Zap 3k 
Dat 


9. Show that €= is not a root of 1, even though (ela 


222 


82. Rings of polynomials 


Along with linear systems, which we studied in Chapters 2 and 3, the other 
traditional branch of algebra is the study of polynomials. A wide variety of mathematical 
problems can be stated or solved in the language of polynomials. There are many reasons 
for this, and one of them is the property of "universality" of polynomial rings, which we 
shall discuss briefly in Subsections 1 and 2. 

Let R_ bea commutative (and, as usual, associative) ring with unit 1, and let 
A bea subring containing 1. If te R, then the smallest subring of R containing <A 


and t is obviously the set of elements of the form 


Ale ear aaa eee ee (*) 
0 HI 2 n : 


a(t) = 
where a A, ne Z,n> 0. Wedenote the ring A[t]} and call it the ring 
obtained by adjoining t to A. The expression (*) is called a polynomial in t with 


coefficients in A. It is clear from looking at a few simple examples how one computes the 


sum and product of polynomials: 


= nt Be 2 
a(t) + b(t) = (ay tajttant J+ (bo +bt+b,t Wes (ag + bo) + (a, +b, )t+ (a, +b,)t 


bs 
a(t) * b(t) = a,b, + (ab +a,_b,)t+ (a,b, +a _b See ann ee cen Fs 
0-0 01 10 0-2 I al 200) hee? 21 ied 


Clearly, we have been able to write the terms in this way because of the commutativity of 


all of the elements ais Dey rX : 


J 
But now we should recall that t was just an element of R chosen at random, and 
so it is possible for two expressions (*) which look different really to be equal. For 
le, if A = - (3 2 2 
example, =W pbs, ten ft Ss am @¢ = 2t are relations which in no way 
follow from the formal rules for working with polynomial expressions. In order to arrive at 
the customary notion of a polynomial, we must get rid of such extraneous special properties 


t might have if it can be any element of R. We do this by taking t to be an arbitrary 


symbol, not necessarily denoting an element of R. The choice of symbol is not important, 


223 


but what is important is the rules for computing a(t) + b(t) and a(t) b(t). Keeping in 
mind these preliminary remarks, we now give the precise definition of the algebraic object 


which is called a polynomial and the set of such objects -- the polynomial ring. 


1, Polynomials in one variable. Let A be any commutative ring with unit. We 


construct anew ring B whose elements are infinite ordered sequences 


Pe gen Eon ae toot, (1) 


such that all of the i. except for finitely many are equal to zero. We define the operations 


of addition and multiplication on B_ by setting 


a fas = (fy, £)5 fs +++) ae (Bg 8» By2--°) = (fy + 8» f+ 8); f+ B59 --+) , 


f+g=h = (hy,h es 


ie 


where 


It is clear that, after adding or multiplying, we again obtain a sequence of the form (1) with 
only finitely many non-zero terms, i.e., an* elementof B. The verification of the ring 
axioms (see §4 Ch. 4) is completely obvious, except perhaps for the associative law. 
Namely, since addition of two elements of B_ reduces to addition of a finite number of 
elements of A, it follows that (B, +) is a commutative group with zero element 
(O00), and any element i — (fy > fi» fy» ...) has additive inverse 

ai & (ine -f, 3 “f, ,--.). Next, multiplication is commutative because the expression for 


hy in terms of the f, and 8; is symmetric, i.e., gives the same thing when f and g 
i 


are interchanged. The expression for hy also shows that the distributive law 


(f + g)h = fh + gh holds in B. We now show associativity of multiplication. Let 


= (fgsf,sfys--)s g = (B18) Bor cee)» h = (ng, h,, hy, +--+) 


224 


be any three elements of B. Then fg = d = (dy; qd, 5 qos aco), wile 


qd, = SS FB =O, Aa6ss 5 eine (fg)h = dh = € = (€),€),0,5---)5 where 
itj=s 
= = = f lat, 5 Ibe compute 
habe DE Cae 2, > S) S| an De 48" Ce iseae 
tt+kes 4+ke=s \it+tj=4 i+jt+k=s 


f(gh) , we get the same result. Thus, B is a commutative associative ring with unit 
(QL. S56 ))s 

Sequences of the form (a,0,0,0,...) are added and multiplied just like the 
elements of A. This allows us to identify such sequences with the corresponding elements 
of A, i.e., toset a = (a,0,0,0,...) forall ae A. This makes A_ into a subring 
of B. Next, let X denote the sequence (0,1,0,0,...). Wecall X a variable (or 
unknown) over A. Using the multiplication operation that was introduced in B, we find 


that 


: (2) 


x 
MT 
iS 


In addition, using (2) and the inclusion AC B » Wwe have 
n 
(ORO 0a, 0 en eee 


Thus, if fo is the last non-zero term in the sequence f = (fy f f A000 


then in our new notation 


n 
Ue) OES rare 8 Oa 


? n-2? 7 sp Beas 


ell re 2 n 
eee ek ey RS yee eG (3) 


This representation for f is unique, since the expression on the right is zero if and only if 


ati ee i.e., ifandonly if f= 0. 


229 


Definition. The ring B introduced above is denoted A[X] and is called the 
polynomial ring over A inone variable X. Its elements are called polynomials. 

Of course, the choice of a particular letter X as the name of the variable is not a 
major terminological advance, but is traditional and avoids misunderstanding. We 
intentionally chose capital X so as to be able to distinguish between the polynomial f= X 
and the function theoretic small x used to denote a variable which runs through some set 
of values. (This distinction is only temporary, and we shall not adhere to it rigorously 
later on.) A function f can be written either inthe form (3) or in decreasing powers of 
xe 


1 


f(X) = a eae eee 


0 n 

We shall use both ways of writing f, depending on which seems more convenient in the 
given situation. The elements a or a, are called the coefficients of the polynomial f. 
We call f the zero polynomial if all of its coefficients are zero. The coefficient fy of 
X to the zero power is called the constant term. If f #0, then f is called the 
leading coefficient, and n_ is called the degree of the polynomial; we write n= degf. 
We adopt the convention of calling the degree of the zero polynomial — (-2 + (-~) = -~, 
-ot+n=-e,-e2 <n forevery ne N). Polynomials ofdegree 1, Beis ann tugs 
called, respectively, linear, quadratic, cubic, etc. 

The unit element inthe ring A[X] isthe element 1 ofthe ring A, considered 
as a zero degree polynomial. It follows directly from the definition of addition and 
multiplication in A[X] that for any two polynomials 

eK g=g,taX+e> +e x (4) 
of degree n and m, respectively, we have 
deg(f + g) < max(degf,degg), deg(fg) < deg f+degg . (5) 


The second inequality in (5) is actually an equality 


deg(fg) = degf + degg 


226 


whenever the product f g of the leading coefficients in f and g is non-zero, 
n°m 
because 


m 


n+ 
= anc f x 6 
fg = f)8 + (8, + £,8)X + + Ate) (6) 


This last fact gives us 


THEOREM 1. If A is an integral domain, then so isthe ring A[X]. oO 


The importance of polynomial rings in the class of commutative rings is partly due to 


the following 


THEOREM 2. Suppose that R_ is a commutative ring containing the subring A. 


For every element te R_ there exists a unique ring homomorphism ll, :A[X]-— R 


such that 


TI, (a) =a, VaeA, TI, (X) =f 4. (7) 


Proof. First suppose that such a homomorphism q, exists. Since ql. (f) = f. 


k 
for each coefficient of f written in the form (3), and ql (Os) = (Tl, (xs = o (by (7) 
and the property of being a homomorphism), it follows that 
n n 
f = eee = eee 
M1, (f) He (fy + £,X + ee) fy t ft + ey, (8) 


th Coe nl, (f) is uniquely determined, and is given by (8). Conversely, if we define a map 
ll, by the formula (8), we obviously satisfy the condition (7), and also obtain a ring 
homomorphism, The map is clearly a homomorphism of the additive groups of the rings, 
and to see that ll, is a homomorphism with respect to multiplication it suffices to apply 


N to the product (6) and then use the general distributive law: 
nt+m a i ~ j 
MLB) = {92 + foe, tia te +g yom. 2, ft De 5.) = T(t) ofl (a) 
== — 
el 


Applying the map Il, defined by (8) to a polynomial f = {(X) is called 


227 


substituting t inplaceof X in f, orelse simply finding the value of f at X =t; 
so we write n, (f) = f(t). Knowing nl, (f) means to be able to compute the value of f 
at X =t. The homomorphisms De for xeA are the connecting link between the 
function theoretic and algebraic points of view on polynomials. By definition, the linear 
polynomial X-c (i-e., the sequence (-c,1,0,0,...)) is not zero, but the associated 
function x *x-c_ takes the value zero when x =c. Here is another example: the non- 
zero polynomial x? + X with coefficients in the field EF, (in which 1+ 1 = 0) gives 
us the zero function f ; EF, = F, Ecinee 0 +0=0 and 17 +120. 

Anelement te R is called algebraicover <A if 1, (f) = 0 for some fe A[X]. 
But if ll, : A[X] - R is a monomorphism, then t is called a transcendental element 
over A. Inthe case A =Q and R= C€ weuse the terms algebraic and transcendental 
numbers. The numbers e and .f are transcendental, and examples of algebraic 
numbers are /2, /3, V2 + VS 

In order to measure by how much the ring A[t]¢ R_ which was introduced at the 


beginning of the section differs from the polynomial ring A [X], we consider the kernel 


acts as the identity 


IL = Ker Tl, of the homomorphism n, in Theorem 2. By (7), ll, 


mapon A; hence, AN J =@, Alco, it = 0 if t istranscendental over A. By 


the theorem on ring homomorphisms (Theorem 2 of Subsection4, §4 Ch. 4), we have: 
Air) = AUOVIL -< () 


The isomorphism (9) can be said to express a universal property of the polynomial 
ring A[X]. The universality of the ring of polynomials comes out more clearly in the 


following generalization of Theorem 2. 


THEOREM 3. Let A and R_ be arbitrary commutative rings, let t bean 


element in R, andlet »:A-~ R bea homomorphism. Then there exists a unique 


extension of @ toa homomorphism e, :A[X]- R whichtakes the variable X to t. 


Theorem 3 is proved in essentially the same way as Theorem 2; we leave the proof 


228 


to the reader as an exercise. ia 


2. Polynomials in several variables. Inthe situation AC R, suppose we take 


n elements thrsees ua € R_ and consider the intersection of all subrings of R 
containing A and u ooo g t . We then obtain a ring Alt, Ace g ti . As in the case 
n= 1, itis natural to introduce polynomial rings in n variables. This is simple to do. 
Recall that the construction of B= A[X]_ started with an arbitrary commutative ring with 
unit, the ring A. Hence, wecanreplace A by B_ in our construction and obtain the 
ring C=BLY], where Y is anew independent variable, which plays the same role for 
B as X didfor A. The elements of C can be uniquely written in the form 

4 a € B, and B is identified with a subring of C, namely, the set of elements 


bY =b-e*1. Since any element 7 € B can, inturn, be written uniquely in the form 


b, = 2 ee 


, ij , it follows that any element of C has the form 


k £ ia 
> > ana Gln, € IN 5 
=) 0) J a 


where, by construction, X and Y commute with each other and with ios: Zinn Gils 
1) 
called the ring of polynomials over A_ in the two variables x eeliayé) WY 


If we repeat this construction, we can obtain the ring A[ Xyseees xl of 


polynomials over A inthe n_ variables xX »+++, X_. We agree to denote an n-tuple 
Sea ——— n 
(i, 7000 ¢ i) of non-negative integers by the symbol i. Then any element 


fe A[X,,... ; x1 can be written in the form 
f=) a x a ene (10) 
(i) (i) (i) 
i i 


(i) 1 


i 4 
WuneRe YO = x ado x 18a monomial. Thus, f is a linear combination of 
monomials with coefficients in A. It follows from the definition of a polynomial that all of 


the coefficients a0 in (10) except for finitely many are equal to zero, 


The expression (10) for the polynomial f is unique, as follows from the claim: 


229 


a_ polynomial f is equal to zero if and only if all of its coefficients a,.. are equal to zero. 


(i) 


We have already seen this in the case n= 1. Toprove itfor n> 1, we use induction 


on n. Namely, we write 


i i it 
n n n 
f = = 
Doe ee 
i n i n 
n 
where 
i i 
1 n-l 
b = 
on ae o ee ee Xl 
[a er | a 


are polynomials ina smaller number of variables. The truth of the claim for n= 1 


along with the induction assumption, show that 


ce earn) 


Thus, two polynomials are equal if and only if the coefficients of each monomial term 
are equal. 


By the degree of f in X denoted deg f » We mean the greatest integer which 


k? 
x) for which a,,. is non-zero. For 


(i) 


occurs in the exponent of xy in a term ei) 


3 Dyas , ; 
example, the polynomial 1+ X+ XY +X Y  hasdegree 2 in X anddegree 3 in 
i, i 
Vem inesinte gent. i gp Oo OO-ar in is called the total degree of the monomial xy ere x 3 


By the total degree of a polynomial f, denoted degf, we mean the maximum of the total 
degrees of its non-zero monomial terms. We set deg0 = -=. It makes no sense to speak 
of the leading term of a polynomial in several variables, since there can be many monomial 
terms having the same maximum total degree. 

Many of the results in Subsection 1 for A{X] carry over to the ring 


ALX , X_]. For example, using Theorem 1 and inductionon n, we immediately 


ee 


obtain 


THEOREM I’. If A_ is an integral domain, then sois A[X a x) peu 


aes 


230 


particular, the ring of polynomials in n variables over a field is an integral domain. | 


soo, 8 JOB 


Next, let A bea subring of a commutative ring R, and let th A 


elements of R. Then the map 


fl :£(X),---, X) > f(t), s+ t), eS ane | ; 


ippoo sre 


gives ahomomorphism A LX, 5008 4 x] ~ R (compare with Theorem 2). We say that 
ti ob 4 - are substituted in place of x »e-»», X in f, orthat f is evaluated at 


xX, =t vee, X= ti. If Kerf, Lie ers 


«ee, t are called 
0 n 
ih? n 


algebraically independent over A. If the elements t goo % fc are algebraically 


dependent, then there exists a non-zero polynomial fe A [X,, dan 4 x] for which 


F(t, ,+++5 t= 0. 


Finally, the generalization of Theorem 3 is 


THEOREM 3' (universality of polynomial rings). Suppose that A and R are 
commutative rings, ty nooo 4 Me are elements of R, and »:A~R isa ring 
homomorphism. Then there exists a unique extension of @ to a homomorphism 


DALX yce., & |= R whichtakes X to f£. 
I n os i i 


= 1 


t il <i < ih, 
perce 


The proof, like the construction of the ring A[ xX) OOO 4 x ], proceeds by 
induction. Theorem 3 gives the assertion when n = 1. Suppose that we have a homo- 


morphism , : ATX 


a6 7 os 
posta g : 


1° 


at - R_ which extends g and takes xX. to 


t, for 1<i<n-1l. Replacing A_ by the ring ALX),...,X 1] and © by 


© in Theorem 3, and using the fact that A(X), BOG 5 x] = 


Bon oedn Sas 


i 


A(X, Hues gl ] [x] , we find the desired homomorphism YE = 
n 


H 
S 


IM . It is clear that this homomorphism is unique, since a homomorphism 
“")n-lon 


231 


from Al x peo0 9 x] is completely determined by its actionon A and on 


Myon eB 


COROLLARY. To every permutation 7 S, ofthe set (ee rower ethene 
corresponds a unique automorphism 7 :fbf ofthe ring A [X, ,++«, X J] which is the 
a a ce et Reena Sta eo eS 


identity on A_ and satisfies 


i Oe te 2) c/o a ae 
! a a a 


9 


Proof. In Theorem 3' set R = ALX),--- ; XI, Ce ee ey 

(1) 7  (n) 

andtake o tobe the inclusion of A_ into ALX,, one 4 x. Then Theorem 3' gives 
us a homomorphism F = ) 


Coeeeeet 


from A[X,,..., X_] to itself (i.e., an endo- 
v it n 2 


~ ~ 


~ -l ~ ~~ 
morphism). Since mm = # =1, 1=1, and *p=%p (this is verified in the 


lemma in §2 Ch. 4), it follows that ® isan automorphism. m 


A useful refinement of Theorem 1’ is 


THEOREM 4. Let f and g_ be any two polynomials in n variables over an 


integral domain A. Then 
deg (fg) = degf + degg 


Proof. A polynomial h (xX, goa0 4 x) all of whose monomial terms have the same 


total degree m_ is called a homogeneous polynomial or a homogeneous form of degree m. 


Forms of degree 1, 2, 3 are called linear, quadratic, and cubic forms, respectively. If 
we combine all monomial terms of a given total degree which occur in f (i.e., with non- 


zero coefficient), we can uniquely write f as the sum of forms ae of different degrees: 


Ge ke dest 


Now if 


232 


g=gtg, tte, = degg , 
then obviously 
fg = £) 8) + (fy 8) + £) 8) ye enemas) 


(this resembles (6), except that the f and g; have a different meaning here). Hence, 
deg(fg) < k +4. Since fy # 0 and gy #0, Theorem 1’ implies that fi. gy 70, 


sothat deg(fg) = deg (f, 84) =k+4=degfitdegg. aa 


3. The division algorithm. In addition to the general properties of polynomial rings 
studied in Subsection 2, these rings have some important special properties. We 
immediately discover one such property if we try to describe the ideals in a polynomial ring. 
We say (see Subsection 3 of 84 Ch. 4) that every ideal inthe ring Z% is principal, i.e., 
of the form mZ. The proof of this fact was based on a mechanism called the division 
algorithm and described for the ring % in Subsection3 of $8 Ch. 1. It turns out that a 
completely analogous algorithm holds in A[X] whenever A isan integral domain. In 


the case A = R_ this algorithm amounts to the usual long division of polynomials in high 


school algebra. 


THEOREM 5. Let A bean integral domain, and let g bea polynomial in 
A[X] whose leading coefficient is an invertible element of A. Then for every polynomial 
f ¢ A[X] there exists a unique pair of polynomials q,re¢ A[X] for which 


feqgtr, degr < degg P (11) 
Proof. Let 


_ n feel 
PS Ay ta, Xx eT a 


eae Bae ee, 


oq 
il 


m ’ 
where agby # 0 and by 1. We use inductionon n. If n= 0 and 


m = degg > deg f= 0, then set q=0 and r=f;: if n=m=0, thenset r=0 


233 


iL 
and q= ay bo . Now suppose that the theorem is true for all polynomials of degree < n 
(where n> 0). Without loss of generality we may assume that m <n, since otherwise 
simply take q=0 and r-=f. With this assumption, we write 


pot yam “gif 


f= ay by 


? 


where deg ee By the induction assumption, we find q and r_ for which 


Ge ae r, where degr< m. If we set 


we obtain a pair of polynomials with the required properties. q_ is called the quotient and 
r is called the remainder. It remains to prove that q and r are unique. 


To see this, suppose that 
de2r = 1-94 g4 0 
Then (q'-q)g = r-r'. By Theorem1, wehave: deg(r-r') = deg(q'-q) + degg, 
which in our situation can only happenif r' =r and q'=q (recallthat deg0O = -o 
and -= +m = -®), 
Note that the coefficients of q and r belongtothe same ring A asthe 


Coctticientaor f and #, Le., f,¢ © AIX] = g,re A[X]. oO 


Remark. The above process of dividing f by g, which is known as the 
Euclidean algorithm, becomes simpler if g isa monic polynomial, which means that its 


leading coefficient is 1. 
Notice that in Theorem 5 the polynomial f is divisibleby g ifandonlyif r= 0. 


COROLLARY. If K_ isa field, then all ideals in the ring K{X]_ are principal. 


Proof. Let T bea non-zero idealin K[X]. Choose a polynomial t = t(X) 
of minimal degree contained in T. Since K _ isa field, the leading coefficient of t must 


be invertible in K. If f is any polynomial in T, then the division algorithm gives us 


234 


f=qt+r, degr < degt. This equality impliesthat re T, since f,t, and qt 
are elements of T. By our choice of t this meansthat r=0. Thus, f(x) is 
divisible by t(X), andso T= (t)=tK{X], ite., T consists precisely of all 


polynomials divisible by t(X). mq 


This corollary is false for polynomial rings in several variables over a field; for 


example, not all ideals in IR[ X,Y] are principal. 


Example. The set T = {Xf+Yg|f,g ¢ R[X,Y]}, which consists of all 
polynomials h(X,Y) suchthat h(0,0) = 0, is obviously an ideal in R(X,Y]. Ifwe 
had T= t(X,Y) RLX,Y], then, since 1 ¢ R[X,Y], we wouldhave te T, and so 
t(0,0) = 0; thus, degt > 1. Nowapply Theorem 4 to the equalities: X = tu ; 
Y=tv. We find that degu ‘ Gay SW the, Woy GIR elo We mw, Sits 


is absurd; hence, T is not a principal ideal, 


The corollary to Theorem 5 is convenient for giving an explicit description of the 


isomorphism (9). As an example, we prove a fact which really goes with Theorem 4 of 81. 


THEOREM 6. The field of complex numbers © is isomorphic to the quotient ring 
R(X1/(X° + 1) REX]. 

Proof. According to (9), @ = R[i] = R[X]/J, where J = {fe RIX]|f(i) = 0}. 
Using the fact that a + bX d J (because a+ib#0 if a and b are not both 0) 
and x? +leJ (since Z + 1= 0), itis not hard to show that J = Gas 1)R[X] 
(use the same argument as in the proof of the corollary to Theorem 5). 

The elements of the quotient ring R[X ]/J are the cosets (a+ bX) +J, 


a,be R; themap a+ibb (a+bX)+J gives anisomorphism between @ and 


Rio. im 


235 


EXERCISES 
1. The polynomials f(X) = ee -3X-1 and g(X)= ee 1 
can be considered as elements either in Z[X] or, for example, ZX] . Apply the 
division algorithm and show that in the first case f(X) is not divisible by g(X), but in 
the second case it is. Would it be possible to have an example where it was the other way 


around? 


2. Using Theorem 3, prove that, if F isa field, then the group of all auto- 
morphisms of the ring F[X] is isomorphic to the group of transformations KPRaX+b, 


where a,be F and a#0O. 


3. Show that a polynomial fe F(X), Rapes aed isa form of degree m (see the 
proof of Theorem 4) if and only if f(tX),--. ; tx) = Sa6e ees j XW)» where t is 
a new variable. 


4. Show that the number of different monomials in n variables of total degree m 


mt+n-l 
is equalto ( i ye 


5. Consider the set A[[XJ] of so-called formal power series f(X) = Sy a, x 
i>0O 


in the variable X , which are defined as sequences (ay; apr agess .) asin Subsection 1 
but with any (perhaps infinitely many) of the ay allowed to be non-zero. The operations 


with formal power series follow the same rules as in the case of polynomials: 
i i i 
2 2%) ee [D4,x) = eG, by x : 
i We k 7 
(ax'}- (25%) = Da eo? 


Show that the set A[[X]] with these operations is an associative and commutative 
ring with unit 1 = (1,0,0,0,...). 


It no longer makes sense to speak of the degree of f, since a formal power series 


236 


may have arbitrarily large powers of X. Instead we define the order w/(f) to be the 


least integer n for which a # O (we agree to set w(0) = +). Show that 
(i) w(f-g) > min{w(f),a(g)}; (ii) a(fg) > wf) ta(g)  - 


if A is an integral domain, prove that w/(fg) = w(f) + w(g). In particular, if A isan 
integral domain, then sois A[[X]]. 


Further show that A[X] isa subringof A[[X]]. 


6. Polynomials and power series are often used as generating functions for 
different types of numbers. We illustrate with two simple examples. 


a) Prove the relation 


by using the binomial formula > () xi = (1+ aye in @[X] and the obvious factoring 
(io ler = (as Se 

b) Find the number os of possible ways one can arrange the parentheses in a 
product of n elements of a set with one binary operation. To do this it is useful to 
introduce the formal power series ("generating function" of the a) 


00) Me it eo a 


oh 2 


, 


the first few of whose coefficients were computed in Subsection 3 of &1 Ch. 4. The 


obvious relation 


: ; 2 
implies that 4(X) = 4(X)-X. Solve this quadratic equation to find 


Dee ee a ES 


2 


(the sign in front of the radical is determined by the condition that £ > 0). But the binomial 
n 


237 


HH 


expansion applied to (1 -4x)* anda simple computation finally gives 


2n- 2 
jt 


it 
| ) 


The reader should fill in the various steps. 


7, The ring A[{X,Y]] of formal power series in two independent variables X 


and Y (which commute with one another) consists of expressions Si a.. x y! ; 
\SOye 


Show that B[[Y]] = NM Il = Cll sei » where B= A[[X]] and C= A[f[Y] 
(repeat the construction of the ring of polynomials in several variables). Show that if <A 


is an integral domain, then sois A[[X]]. 


83. Factoring in polynomial rings 


1. Elementary divisibility properties. In various places, starting in Chapter i 


we have touched upon questions of divisibility in the ring of integers Z , but we have not 
yet proved the so-called Fundamental Theorem of Arithmetic. It is now time not only to fill 
in this gap but to prove this result for a wider class of rings, in particular, for the ring 
K[X] of polynomials over afield K. 

Suppose we have an arbitrary integral domain R. The invertible elements of R 
can also be called the divisors of 1. For example, it is clear that a polynomial fe A[X] 
is invertible if and only if degf = 0 and f= fy is an invertible element inthe ring <A; 
this is because if we had fg = 1, wewouldhavetohave degf+ degg = degl=0. 

We say that anelement be R is divisible by aeR (orthat b isa multiple of 
a) ifthere exists ce R suchthat b = ac; this is denoted alb . Ifboth a lb and 
bla , then a and b are called associated elements. Inthat case b= ua, where 
u l1 . By the remark in the last paragraph, two polynomials f,g ¢ A[X] are associated 
if and only if they differ by a multiplicative factor which is an invertible element of A. 


A non-zero element peR_ is called prime (or irreducible) if p is not invertible 


238 


and cannot be written in the form p=ab, where both a and b are non-invertible. 


A field has no prime elements, because every non-zero element is invertible. A prime 
element of the ring A[X] is called an irreducible polynomial, 


We note the following basic properties of divisibility in an integral domain R: 
1) If afb and bic, then alc. Namely, we have b= ab' and c 
where b',c' ¢€ R. Hence c = (ab')c’ = a(b'c'). 
2) HE cla and clb, then cl(a +b). Namely, since a 


forsome a',b'«R, 


Sica’ ate) iy S elo” 


it follows by the distributive law that a + b = c(a’ + b'). 


3) if alb, then albc. Clearly, if b=ab', then bc = (ab')c 


= a(b'c). 
Combining 2) and 3), we obtain 


4)) ltedch clement “b,) beao0., 6b © Ro te divisible by aeR, then so is any 
ee)? m 


linear combination by ec, + b,c 


don h 
1 ot +bioc.,? where c 


? Coy trey So are any 
elements in R. 


Definition. We say that an integral domain R isa unique factorization domain 
a 
(or a factorial ring) if any non-zero element aeéR canbe represented in the form 


a= UP, Py coo 


where u_ is an invertible element and Py» Doser s p. are prime elements (not 
necessarily distinct from one another), and if, given another such decomposition 


as vq) 45 e068 i y» Wwe must have 


2 ap ace a El a 


for a suitable choice of indexing of the p's and q's and for suitable invertible elements 


If we allow r toequal 0 


in (1), we are adopting the convention that invertible 
elements of R 


also have a decomposition into prime factors, although a trivial one. It is 


239 


clear that, if p isaprimeand u is an invertible element, then the element up is 
also a prime. Forexample, in Z, whichhas invertible elements 1 and -1, we can 
agree to choose the positive prime in each pair of associated primes ee -p} . Ina 
polynomial ring K[X] overa field K_ it is convenient to choose monic (i.e. , leading 


coefficient 1) irreducible polynomials, 


We have the following general 


THEOREM 1. Let R_ be any integral domain in which every element has a 
factorization (1). That factorization is unique (i.e., R is a unique factorization domain) 
if and only ifany prime peR_ dividinga product ab, a,beR, mustdivide a or b. 


Proof. First suppose R_ is a unique factorization domain. Let ab=pc. If 


a=|]a,, b=[[b,, Goaal ier. 


are decompositions of a,b,c into prime factors, then the equation Na, Je =pn cy 
implies that p must be associated with one of the primes a, or ef ,» ie., p divides 
@ i le), 

Conversely, suppose whenever pjab we musthave pja or pjb. We must show 
that Ris a unique factorization domain. We use induction on the minimal number n of 
prime factors in a decomposition (1) of aeéR. We prove that any element a _ which is a 
product of n primes factors uniquely. If n= 1, i-e., if a  itselfis a prime, this 
assertion is easy to check, and we leave this to the reader. So suppose that the assertion 


holds for any element which is a product of <n primes, andlet a # 0 bea product of 


n+1 prime factors. Let 
= = 2 
a | | Ps | oe, (2) 


be two decompositions of a with m >on. Applying the hypothesis of the theorem to 


Pei Gay 3 we find that Dat must divide one of the elements Fister Tags With- 


out loss of generality (renumbering the r's if necessary), we may suppose that 


240 


But 1 is a prime, sothat r = up where u_ is an 


Pati|*m+1° m+1 n+l’ 


invertible element. Using cancellation in R (Theorem3 of $4 Ch. 4), we see that (2) 
n m 
gives us TN p,=u 1 r,. But the left side of this equality is a product of n primes. 
i=1 j=l 
By the induction assumption, m-=n_ and the two decompositions can only differ in the order 
of the primes and the possible presence of invertible elements u as ratios of corresponding 


p's and r’s. This completes the induction step. D 


In an arbitrary integral domain R_ it is not necessarily true that every non-zero 
element a has a prime decomposition (1). But what is even more interesting is the 
integral domains in which all elements have prime factorizations, but they are not unique. 
Thus, the condition in Theorem 1, which at first glance seems trivial, is not always 


satisfied. 


Example. Consider the imaginary quadratic field @(/-5) (see the example in 
Subsection 5 of §1), and in this field consider the integral domain R={a+ by-5 a,be Zz}. 
: 2 2 : F eae 
The norm N(a+by-5)=a + 5b of any non-zero element in R_ isa positive integer. 
ee F , a 5 
If @ is invertible in R, then N(a) = N(a@ y € Z, andhence N(a@) = 1. This is 


only possible if b= 0 and a=+1. Thus, in R , like in 2%, the only invertible 


elements are +1. If pope ogee i Ue €=41, then N(a) = N(a))... N(@). 
Since 1 < N(@) e N, it follows that for a given @ the number of factors is bounded. 
This implies that any element @ eR hasa prime factorization. 


But the number 9 (and plenty of others) have two essentially different prime 


factorizations: 


Oras 3313 12 ha) eas) 


It is obvious that the elements 3 and 24.,/-5 are not associated. Furthermore, none 


of these elements of R factors further, because they each have norm 9 andif a = a oS 


for N(@)=9 and neither a, nor es equals +1, then it follows that 


244 


N@,) = N(@,) = 3. But this is impossible, since the equation = + Sg =3, *,yeL, 
cannot be solved. Thus, we really do have two distinct prime factorizations of 9. 

This example suggests a wide circle of questions, some of which have not yet been 
answered, concerning the quadratic fields a(/d) . Such questions are studied in algebraic 
number theory. 

Before using Theorem 1 to prove that various rings have unique factorization, we 


introduce some important auxiliary concepts, which also are of independent interest. 


2, G.c.d. and lc.m. inrings. Let R_ be an integral domain. By a greatest 


common divisor of two elements a,b eR, denoted g.c.d. (a,b), we mean any element 


deK_ having the two properties 


(i) dia ; db; 


(ii) cla and ab cae 


Clearly, if d has properties (i) and (ii), then so does any associated element 
ud. Conversely, if c and d are two greatest common divisors of a and b, then 
we have c/d and djc, sothat c and d areassociated. The notation g.c.d. (a,b) 
is used for any greatest common divisor, i.e., we do not distinguish here between associated 
elements. If we agree for the time being not to distinguish between associated elements, we 


have the following further properties of the g.c.d. : 


tl 


(iti) g.c.d. (a,b) ae alb; 


{iv) g.c.d. (a, 0) alee 


@amonctden(tag tb)m= itagrcadai(a,b)I> 


i 


(ab) eeeck (Gach (10), ©) Bolt (Ay epteaGh (o,©))e 


These properties are easily checked, and we leave that to the reader. Property (vi) 
allows us to extend the notion of g.c.d. to an arbitrary finite set of elements: if we 
define g.c.d. (a, HOB 4 a.) = Osh (tech (oso Be Goth (a); ay)e ods a.) , then this does 


not depend on the order of the a’s. 


242 


Along with g.c.d. we have the companion concept of the least common multiple 
m = l.c.m. (a,b), which is defined (to within multiplication by a unit u, i.e., we again 


do not distinguish between associated elements) by the two properties: 


(i') alm, b{m ; 


(ii’) ale and ble => mlc. 
In particular, setting c = ab, we obtain: mjab. 


THEOREM 2. Suppose that the g.c.d. andthe lc.m. exist for two elements 
a,b inan integraldomain R. Then: 


(a) l.c.m. (a,b) = 0 # a = O or b= On 
(b) To a,b 70, ms Ue.m. (a,b), end ab=dim, then d= g.c.d. (a,b). 


Proof. Assertion (a) follows immediately from the definition of l.c.m. (a,b). 
To prove (b) we must show that the element d defined by the equation ab = dm_ has 
properties (i) and (ii). First, (i') impliesthat m = a'a, m= b'b. Hence 
ab = dm = da'a, sothat, after cancelling a, as we are allowed to do in an integral 
domain, we obtain b= da', i.e., d|b - We similarly show that dla » andso d 
satisfies (i). 


Next, suppose that a = fa", b= fb". We must show that tld o See = fais 


Then c¢ = ab" = ba" is acommon multiple of a and b. By property (ii'), we have 
c =c'm forsome c'¢R, and hence fens te] aay = Gan ie Eh 
d=fc' andso fid. Oo 


Note that Theorem 2 merely gives a relationship between the g.c.d. and the 


l.c.m. when they are known to exist. It does not give us a method for actually computing 
the g.c.d. and l.c.m., nor does it guarantee thatthe g.c.d. orthe l.c.m. of two 


elements will always exist. 


Now suppose for the time being that R isa unique factorization domain. Let P 


243 


denote a set of prime elements in R_ such that every prime in R_ is associated with one 
and only one element of ®. When we consider the prime factorizations of two elements 
a,beR, it is convenient to assume that the primes that appear in each are the same 
elements of P, but possibly with zero exponents, i.e., we write 
k k 4 £ 

7 1 1 

aS Up, soe (OL Desay cog [ol 5 
(3) 
vil; kes eet Ss 0e peer: Tes 


Using Theorem 1, we readily obtain the following easily remembered divisibility criterion. 


Divisibility criterion. Let a and b_ be two elements of a unique factorization 


domain R written inthe form (3). Then: 


Uae bee ieanedroniyeie Roue 4 fo O lee ee 
ler ll, 


s s 
al r 

D) Gade Gio) = Pyocee Pho where s, = min{k,, 41, ieee ace 
no is 

3) Ibs mo (Ayo) = Py vee PL > where La max {k., 4.1, ie Ua. 5 3e 


Thus, we take s, to be the smaller and ti to be the greater of the two exponents 
ks oe . Inparticular, two elements a and b are relatively prime, i.e., 
g.c.d. (a,b) = 1, if and only if the prime factors which occur with positive exponent in the 
factorization of one of them do not appear with positive exponent in the factorization of the 
other. The one trouble with this divisibility criterion is that in practice it is often very hard 
to obtain a decomposition of the form (3). Evenin the case R= Z, one has to be 
satisfied with some variation of the method of going through all primes less than some given Nn. 
For this reason, it is all the more gratifying that there is a more effective method for 


computing the g.c.d. and l.c.m. ina fairly large class of rings, as explained in the 


next subsection. 


3. Unique factorization in Euclidean rings. The division algorithm in Z and 


244 


KLX] (see Subsection 3 of §8 Ch. 1 and Subsection 3 of §2 in this chapter) makes it 
natural to consider the class of integral domains R in which to every non-zero a_ there 
is associated a non-negative integer 6(a) (i.e., 8 isamapfrom R* = R\{0} to 


N vu {0}) such that: 


(El) S(ab) > 6(a) forall a,beR*; 
(E2) givenany a,beR, b #0, there exist q,reR (q_ iscalled the 


“quotient” and r is called the "remainder") such that 
a=qbd¢+r; S(r) < 8(b) or r = 0 . (4) 


An integral domain R for which sucha function 6 exists is called a Euclidean 
ring. If we define 6(a) = la | for ae Z anddefine 6(a) = dega for 
a =a(X)e KLX], we see ae Z and K[X] are Euclidean rings. 

In Euclidean rings there is a special algorithm, called the Euclidean algorithm, for 
finding the g.c.d. oftwoelements a and b. Let a and b_ betwo non-zero 
elements of a Euclidean ring R. Applying the procedure in axiom (E2) a large enough 
(but finite) number of times, we obtain a Sequence of equations of type (4) whose last 


equation has zero remainder: 


A 
on 
a 

ley 
~~ 


5(r)) 
6(r,) — ote, 


Sle aoe 6(r,) < 8(r,) , (5) 


ap qd, ae + Ths 6(r,) < 6(r,_)) ; 
k-1 — era Me? kt] 


This is the case because the strictly decreasing sequence of positive integers §(b) > 6(r)) > 
> 5(r,) > *** must terminate, and this is only possible when a remainder vanishes, 
We claim that the last non-zero remainder Ty is precisely the greatest common 


divisor of a and b, as defined in Subsection2. To see this, note that, by assumption, 


245 


males . Going from the bottom up in (5) and using property (4) of the divisibility 
relation (see Subsection 1), we obtain the chain of divisibilities: 


Ty [Pyar Ty [Feepe eet ; tlh, aaen and finally r[b and ra. Hence, Ty is 


a common divisor of a and b. Now suppose that c_ is any other common divisor of 
a and b. Then c hy , and, now going from the top down in (5), we obtain the chain 


Thus, the g.c.d. of a and b 


of divisibility relations: c Ir. nc Ir3> Ao ¥ 


c Ir 
exists, and is equal to Tr: 


ry = gcd. (ae 5) ae (6) 


Next, notice that each remainder r in (5) is a linear combination of the two 
preceding remainders ry and Too with coefficients in R. This is when i> 3; 
as for Tos it is a linear combination of b and ris and ry is a linear combination 
of a and b. By successively substituting expressions for Tie} and Tio in terms of 


a and b, we eventually obtain an expression for rh of the form 


r= au + bv (7) 


for some u,veR. 


Comparing (6) and (7) and taking into account Theorem 2(b) , we obtain: 


THEOREM 3. Ina Euclidean ring R any two elements a and b havea 


greatest common divisor and a least common multiple. Using the 


Euclidean algorithm it is possible to find u,veR suchthat g.c.d. (a,b) = au+bv. 
In particular, two elements a,beR are relatively prime if and only if there exist 


elements u,veéR_ such that 


iv) 
i= 
Ab 
ion 
<4 
ll 
(= 
O 


COROLLARY. Let a,b,c be elements of a Euclidean ring R. 
(i) If g.c.d. (a,b) = 1 and g.c.d. (ie ele cieme 2c (a, bc) i 


(ii) If albe and g.c.d. (a,b) = 1, then alc. 


246 


(iii) If bla and cla and g.c.d.(b,c)= 1, then bela. 


Proof. (i) By Theorem 3, we have au, + bv, = 1 and au, + cv, = il. 
Multiplying together the left and right sides of the two equations, we obtain: 


=] which gives the desired conclusion. 
a(au,u, + bu,v, + cuyv,) + be(v,v,) : g 


(ii) Since au+bv=1, wehave ac+-u+(bce)v =c. But be =aw, and 


hence c =a(cu+wv), iie., ale. 
(iii) By property (ii') of the l.c.m., 
bla, cla => Il.c.m. (b,c) |a =— bela , 


since be = g.c.d. (b,c) + Le.m. (b,c), and g.c.d. (b,c) = 1 by assumption. el 


We leave it to the reader to generalize Theorem 3 to the case of the g.c.d. ofa 
finite set of elements in a Euclidean ring. 
The next lemma is an important step in proving that Euclidean rings have unique 


factorization. 


LEMMA. Every Euclidean ring R_ has factorization (i. e. » any non-zero element 


aeéR_ can be written in the form (1)). 


Proof. Suppose that the element ae R hasa proper divisor b, i.e., 
where b and c are non-invertible elements (in other words, a and b are not 
associated). We prove that 6(b) < 6(a). To see this, first use (El) to obtain: 

5(b) < 6(bc) = 6(a). If we had §(b) = 6(a), by (E2) we could find q and r with 
b=qa+r, where 6(r) < 6(a) orelse r=0. The possibility r= 0 would 


contradict the fact that a and b are not associated, i.e., ath . Furthermore, since 


¢ is not invertible, we have: 1-qe # 0. But then, by (El), 


6{a) = 6(b) < 6(b(1 -qc)) = 6(6-qa) = 8(r) < 5 (a) 


2) 


a contradiction, Thus, 6&(b) < 6(a). 


247 


Now if a=a Agree ay where all of the a, are non-invertible, then for each 


m the element a 


a. is a proper divisor of a_a a and so 
m 


a eee 
m+1l m+2 (nse in? 


6(a) = b(a, ayers a) > S(a,... a.) >see > (a) 2 6(1) 


This strictly decreasing sequence of non-negative integers has length n < 6(a). Hence, 
there is a maximum value n_ suchthatthe decomposition a = a,a,++. a can split up 
n 


no further, i.e., all of the a, are prime elements. O 


THEOREM 4. Every Euclidean ring R_ is a unique factorization domain. 


Proof. Using the above lemma and the criterion in Theorem 1, we see that it 
suffices to prove that, if p is aprime element of R_ dividing the product be of two 
elements b,ceR, then p must divide either b or c. 

First, if b=0 or c=0% there is nothing left to prove. Butif bc #0 and 
d=g.c.d.(b,p), then d, since it divides p, mustbe either p or 1 (more 
precisely, an element associated to one of these two elements). If d= 1, then b and 
pare relatively prime, and part (ii) of the corollary to Theorem 3 allows us to conclude 


that pice lied) — ip.) then plb. 0 


COROLLARY. The rings Z and K[X] (where K_ is any field) are unique 


factorization domains. fa) 


The polynomial rings K{X,,; B00 % x3 where n> 1, which are not Euclidean 
rings, are nonetheless unique factorization domains, as we shall prove in Chapter 9. We 


shall also give further examples there of unique factorization domains. 


4, Irreducible polynomials. Recall that a special case of the above definition of a 
prime element of a ring is: a polynomial fe K[X]_ of degree greater than zero is called 
irreducible (over the field K) if it is not divisible by any polynomial g € K{X] for which 


0 < degg < degf. In particular, every linear (first degree) polynomial is irreducible. It 


248 


is easy to see that the question of whether or not a polynomial of degree > 1 is irreducible 
and the problem of decomposing it into a product of irreducible factors are intimately 
connected with the "ground field" K. For example, the polynomial x? +1 is irreducible 
in R[X] but decomposes as x? +1= (X+i)(X- i). The polynomial a +4 can be 


factored even over @Q, although this fact is not easy to guess at first glance: 
4 2 2 
I ap eh (Ck) oA EMO 45 sD) 


Both of the factors on the right are irreducible not only over @ ,» but evenover R; 
however, they are reducible over @. 

The set that plays the role of the positive prime numbers in Z_ is the set of 
monic (i. e., leading coefficient = 1) irreducible polynomials. As in the case of the primes 
in Z (see §8 of Chapter 1), the set of monic irreducible polynomials over any field K 
is infinite. This is obvious in the case of an infinite field K; just take the irreducible 
polynomials X-c, ce K. If K_ isa finite field, we use an argument of Euclid. Namely, 
suppose we already have n_ irreducible polynomials Py pesey Pe The polynomial 
i PP, oe PL +1 has at least one monic irreducible polynomial divisor, since 


degf >n. Let us denote it P, It is different from Deore ae since if we had 


+1° 


p p. forsome s <n, it would follow that Pe Gis Pyeee Pp) = i, Weiss, 


n+l Ps 
K[X] has infinitely many monic irreducible polynomials, 

Since there are only finitely many polynomials of a given degree over a finite field, 
we have the following useful corollary of the result of the preceding paragraph: There exist 
irreducible polynomials of arbitrarily high degree over any finite field. We will obtain a 
more precise version of this qualitative assertion in Chapter 9, 

Irreducible polynomials over @ play a Vital role in the theory of algebraic number 
fields. Since we can always multiply a polynomial in @[X] by a suitable natural number 
to obtain a polynomialin Z[X], we might first want to clarify the relationship between 


irreducibility over @ andover Z. Since we will also be interested in other applications, 


we prove a general assertion about polynomials over a unique factorization domain R. By 


249 


the content d= d(f) ofa polynomial f =a Pages hte? cee x © Rix] we mean 
——— n 


0 
the greatest common divisor of all of the coefficients. Up till now we have only discussed 
the g.c.d. of two elements, but the properties (i)-(vi) of the g.c.d. allow us to 


extend this notion to any finite set of elements of an integral domain. If d(f) isan 


invertible element of R, then f is called a primitive polynomial. 


GAUSS'S LEMMA. Let R_ bea unique factorization domain, and let f, ge R[X]. 


Then 


aie) i) eas 


where equality is understood to mean to within an invertible element of R, i,e., we do not 
distinguish between associated elements, In particular, the product of two primitive 
polynomials is a primitive polynomial. 

Proof. We first prove that the product of two primitive polynomials is primitive. 


Let 


n m 
Be ec ae eect ae B= by bk et Bes 


be two primitive polynomials in R[X] whose product fg isnot primitive. Thus, there 
exists a prime element peR which divides d(fg). Choose the least indices s and t 
such that pa, and pT, , as wecando, since f and g are primitive. The 


pceificient of Xin feds 


= sod 4) 
Sts aaa. Be 7 Chae = we ree iG 


s-l Soe? eae? ee ‘i 


Since we have assumed that ay and an are divisible by p for i> 0, and since 


ple ae because pld(fg) , it follows that for suitable u and v 
s 


= bo pi 
jor cane Brg 


so that pla b . Since R isa unique factorization domian, this means that p a, or 
s 
p P, ; acontradiction. This proves our claim that fg is primitive. 


Proceeding to the general case, we can write any two polynomials f,ge¢ R[X] in 


250 


the form 
PSNG 8 = EE, 


where fy and 8 are primitive polynomials. Since fg = d(f) d(g) fy Bq» and we have 


proved that d(f,g,) = 1, it follows that d(fg) = d(f) d(g). a 


080 


COROLLARY. A polynomial f ¢ Z[X] which is irreducible over Z is also 


irreducible over Q. 


Proof. By the corolaryto Theorem4, Z is a unique factorization domain, so 
that Gauss's Lemma appliesto Z[X]. Suppose that f = gh, where f ¢€ Z[X] and 
g,he Q[X]. If we multiply both sides of this equation by the least common multiple of 
the denominators of the coefficients in g and h » Wwe obtain af = bg, ho » Where 
a,be Z and 8q» ho are primitive polynomials in Z[X)]. By Gauss's Lemma, 
ad(f) = b, so that we can divide by a toobtain: f = d(f) 8 hy , Which is a factorization 


of f over Z. O 


Eisenstein Irreducibility Criterion. Let 


n faved 
OO) = Bk BAD > 
(X) ta) X + oe ee oo 


be a monic polynomial over Z whose coefficients Ayyeres a ae all divisible by some 
prime p, but whose coefficient a, is not divisible by a - Then f(X) is 


irreducible in Q[X)]. 


Proof. If we suppose the contrary and use the corollary to Gauss's Lemma » we can 
write f asa product of two monic polynomials over Z: 


s s-l te =a) 
OM) = (6 +b, x ae ceca) 0 Hee eee tay) Se = O 


This factorization is preserved when we pass to the quotient ring Z[X]/(p) = Z% [x], 
p 


whose elements are obtained from polynomials with integral coefficients by reducing 


modulo p. By assumption, a= 30e where a. is the residue class modp of a, 
i 


251 


But | X] is a unique factorization domain, by the corollary to Theorem4. Comparing 


the two decompositions 


we come to the conclusion that Db, =0O=c¢ , i.e., all of the coefficients b, and c. 
J 


Perea 2 
are divisible by p. In that case el I oF is divisible by p , a contradiction. This 


proves the Eisenstein Irreducibility Criterion. oO 


Remark. The above criterion also applies when the leading coefficient ay is not 


1 but is still prime to p. 
: p-l Dies eee - 
Example. The polynomial OO) = Oe +X +++. +X +1 is irreducible over 
@ forany prime p. 


To see this, if suffices to notice that irreducibility of f is equivalent to 


irreducibility of the polynomial 


Coen | 


a a yt 


p-l | Py yp-2 an Pp Pp 
on + Gx + ge ye Or Cae ; 


to which the Eisenstein Irreducibility Criterion applies, since all of the coefficients after 
the leading one are divisible by p to the first power (see Exercise 8 of §4 Ch. 4 for 


this property of binomial coefficients). 
EXERCISES 


1. Show that 


nZa+mZz 


Ei once dD), 


nzZznmZ 


ll 


WO Mn tony (pian) 


2, Let f and g be monic polynomials in Z[X]. Show that in the equation 


g.c.d. (f,g) = fu+gv with u,ve Z[X] we maytake u and v so that 


252 


degu < degg and degv < degf. 
3. Arethe rings Z[V-3] and Zot X] unique factorization domains? 
4, Factor X"-1 into irreducible factors in Z[X] for 5<n< 12, 


5. Prove that the irreducible factors in the factorization of a homogeneous 
polynomial 


- =] 
£(X%,Y) = Be ae i ee eae e @[xX,Y] 


are homogeneous, and that f£(X,Y) is irreducible if and only if the polynomial BO, Jb) 


= 45 nae + ay xt fieee + aiy% + aié @[X] is irreducible. 


6. Let K bea field, and let f£(X) = ye a x! be a formal power series in 
: i>0 
? 


K[LX]] (see Exercise 5 in §2). The condition ay #0, or equivalently w/(f) = 0 


is necessary and sufficient for there to exist a power series g(X) ¢ K[[X]]_ such that 


-j 7 
fg=1. Forexample, (1-X) = }> X'. Theelement X is the only prime in 
i>0o 


K[I[X]] (except for associated elements). K{[[X ]] is a unique factorization domain. 


Prove these assertions. 


ae = , soe 
Prove that det (x) ce *w(1), l 


TeS 
n 


ae is an irreducible 
M(n),n 


homogeneous polynomial of degree n_ inthe ne independent variables x, 


oo 


84, The field of fractions 


1. Construction of the field of fractions of an integral domain. The last two sections 


established many properties that Z% and K[X] have incommon. Our next goal is to 


imbed K[X]_ ina field (justas Z is imbedded in @Q). But it is actually no harder 


to do the same for any integral domain A. 


253 


Consider the set A x A* (A* = A\{O}) ofall pairs (a,b) of elements 
aybe A with b #0. We partition this set into classes by considering two pairs (a,b) 
and (c,d) to belong to the same class if ad = bc; in that case we write (aS byE=(cxd)e 
We obviously always have (a,b) ~ (a,b), andalso (a,b) ~ (c,d) = (c,d) ~ (a,b). 
Ginallys a,b) 2 (e,d) and) (cid) — (ey1)= (a,b) — (e,8), since adi— be and 
cf=de implythat adf =bcf=bde, ie., d(af-be)=0. But d #0, and, 
since A isan integral domain, we obtain af = be, which means that (a,b) ~ (e,f). 
Thus, the relation ~ is reflexive, symmetric, and transitive, i.e. (see $6 of Ch. 1), 
it is an equivalence relation on the set A x A* and so gives a partition of Ax A* into 
disjoint classes, 


Let Q(A) be the set of all equivalence classes, i.e., the quotient set A x A*/~. 


We shall let [a,b] denote the class in which (a,b) lies. By definition, 
fayb) = [e;d) == ad = be | (1) 
If we define addition and multiplication operations as follows on the set A x Aas 
(a,b) + (c,d) = (ad + be, bd); (a,b) (c,d) = (ac, bd) 


(this makes sense, since bd #0 whenever b #0 and d # 0), then these binary 


operations can be carried over to Q(A). To see this, we must show that 


( (a,b) + (c,d) ~ (a',b’) + (c,d) 
(ae) (a,b) 
( (ae) eee gaia ob) (cad) 


The equivalences on the right will hold if 


(ad + bc)b'd 


Il 


(aid + b'c)bd , 


ane 9 lia) = aie 0 lel 5 


and these equations follow immediately from the condition a'b = ab’. Wehave a similar 
result if (c,d) is replaced by (c',d'), where cd’ = c'd. Thus, the sum and product 


of two elements in Q(A) do not depend on the choice of representatives of the equivalence 


254 


classes, and we have: 
{a,b] + [c,d] = [ad+bc,ba]; La ollea]] = lac, medi) « (2) 


Here we really should write [a,b] ® [c,d] and [a,b] © [c,d], but there is no loss 


of clarity if we use the same symbols + and « as before for addition and multiplication 


in Q(A). 


We now show that Q(A) isa field under the operations (2). First, the relations 


(a,b) + ({c,d] + [e,f]) fable et de, dt | = sladt pct bde. bd te. 


(La,b] + [c,d]) + [e,f] [ad + bce, bd] + [e,f] = [adf + bcf + bde, bdf] 


imply the associative law for addition. Associativity of multiplication is obvious. Next, the 


relations 


tl 


(La,b]+[c,d]) + [e,f] = [ade +bcee,dbf] , 


[a,b][e,f]+[c,d][e,f] 


fadiei— Geet bidtl= Mader been (pdt | 


and the condition (1) for two equivalence classes to be equal show that the distributive law 
holds. It is just as easy to check that addition and multiplication are commutative. The zero 
for addition is [0,1] (since [0,1]+[a,b])=[a,b]) andthe identity for multiplication 
TST Next eeoa tl aso eesinee | asta) [-a,b]) = [0,b7] = (G2. Sa 
far we have shown that Q(A) is a commutative ring with unit. If {a,b] # [0,1], then 
ay OF in AS so chav ake OC) sandeep ial e= (ln thes | bea) is the 
multiplicative inverse of [a,b] # [0,1]. We have thus proved that Q(A) isa field, 

The map a’ [a,1] is aninjective map f:A —- Q(A) whichisa ring homo- 
morphism. (It is easy to check that f(a+b) = f(a) + f(b) , flab) = f(a) f(b), and 


a #b = f(a) # f(b).) Forany element x =[a,b] «€ Q(A) we have 
(by) Neeee Lae 


sothat x isthe "ratio" f(a)/f(b) of elements in f(A). For this reason, Q(A) is 


called the field of fractions of A. 


209 


It is convenient to identify every element ae A with its image f(a) = [a,lle QA), 
i.e., to identify A with f(A). We then call the element [a,b] a fraction and write it 


in the usual form 


a 
pyle) ate 
By now it should be clear that the above rules for operating with the equivalence classes 


{a,b] repeat the rules for operating with fractions in a field (see (10) in Subsection 5 of 


§4 Ch. 4). We have proved 


THEOREM 1. For every integral domain A. there exists a field of fractions 


Q(A) whose elements have the form a/b, ac A, O# be A. ‘The operations in Q(A) 


are given by (1) and (2), where [a,b] is replaced by a/b. el 


é 
‘The construction of the field of fractions of an integral domain is used fairly often in 
mathematics. The simplest example Q(Z) = Q shows that it is a very natural idea. It is 


easy to see (the reader should check this!) that Q(A) =A if A isa field. 


Remark. It can be proved that, if the integral domain A isa subring ofa field K 
in which every element x can be written asa ratio of two elements of A, then 


K = Q(A). Forexample, Q(d) = Q(z(V4)). 


2. The field of rational functions. Let K_ bea field, and let K[X]_ be the ring 
of polynomials over K. The field of fractions Q(K[X]) of K[X] is denoted K(X) 
(i.e., square brackets are replaced by parentheses) and is called the field of rational 
functions of the variable X with coefficientsin K. 

Note that the field of rational functions K(X) always contains an infinite number of 
elements. The characteristic of this field is the same as the characteristic of K. Thus, 
as (X) is an example of an infinite field of characteristic p>0O. 

Every rational function in K(X) can be written (in fact, in many ways) in the form 


f/g, where f and g are polynomials in K(X] and g #0. By definition 


256 


ie = f,/8, oe fg = fig - Wecall f the numerator and g the denominator of the 


1 
rational function f/g. The rational function does not change if both the numerator and 
denominator are multiplied by the same non-zero polynomial or if a non-zero common 
factor is canceled. In particular the (positive, negative, or zero) integer deg f - deg g 
depends only on the rational function, and not on the particular way it is written in the form 
f/g. This number is called the degree of the rational function. A rational function is said 
to be in lowest terms if its numerator is relatively prime to its denominator. Up to 
multiplication by an element of K_ in the numerator and denominator, any rational 
function can be uniquely written as the ratio of polynomials in lowest terms. Namely, if we 
have the rational function in the form f/g, we candivide f and g by g.c.d. (f, g) 
to obtain a fraction in lowest terms; and if f/g and f/g, are two rational functions in 
lowest terms which are equal, we have fg, = fg , from which it follows that f = cf, F 
ceK, and g= cg) (use the corollary to Theorem 4 of 83). 

If deg(f/g) = degf - degg < 0 and f/g is in lowest terms, wecall f/g a 


proper fraction. (The zero polynomial is also considered to be a proper fraction , 


since by convention deg 0 = -~,) 


THEOREM 2, Every rational function in K(X) canbe uniquely written as the sum 


of a polynomial and a proper fraction. 


Proof. If we apply the division algorithm to the numerator and denominator of AR 
we obtain f= qg+r, where degr < degg. Then f/g =q + r/g is written in the 
desired form. If we alsohave f/g = q + r/g (q,r,g€ K[X] , degr < deg g) , then 
we obtain 

G4 = a -r/g=(e@e =re/eq 
Since q-qe¢K[X] and 
deg (rg - rg)/gg) = deg (rg - rg) - degg - degg < 0 , 


this can only occur if q - q = 0 and r/gs= r/g. 0 


257 


3. Primary rational functions. A proper rational function f/g ¢€ K(X) is called 
: ’ n 
primary ieee 7 Pi) eee where spe p(X) is an irreducible polynomial and 
deg f < degp. 


The fundamental theorem on rational functions is 


THEOREM 3. Every proper rational function can be uniquely represented as a sum 


of primary rational functions, 


The proof of Theorem 3 is divided into two parts -- existence and uniqueness of 


the desired representation. 


I Let f£/g ¢ K(X) be the given proper rational function. Without loss of 
generality we may assume that g is monic. Suppose that g = 818» is the product of 
two relatively prime monic polynomials. According to the results of $3, we have the 


e 


relation 
WS uw By + Uy 8, 


for some Uy, U, € K{X]. Multiplying both sides by f, we obtain 


fe fu, 8) + fu, 8, 


with deg Vy < deg Bos then 


If fu, = q8, + Vv, 


= 3 
f=vi 8 +8 » (3) 
where v, = 48) + fu, Te oincesmdes: e deg 8, and degf < deg g (because the 

rational function f/g is proper), it follows that (3) can only hold if deg vy, < deg 8° 


Dividing both sides of (3) by 8,853 We obtain an expression for f/g asa sum 


of proper rational functions: 
f/g = v,/8) te V,/85 S 


If either g, or 8, can be written as a product of two relatively prime polynomials, then 


we repeat the procedure. We finally arrive at the expression 


258 


f m a, 
1 
a > = (4) 
? 
g i=1 a 


Pi 


where g.c.d. (a,, P,) = 1 and deg a, & a deg P; foreach i. Here the denominators 


n, 
are the powers Pp, of the monic irreducible polynomials in the factorization of ¢g: 


n n n 
m 


S = Py Py vee Py (S) 


(p, Ap, for i#)). 
We now further decompose each proper fraction a/p” op isubalex>  (oleter et << ial coleter jo) - 


the division algorithm gives us the sequence of equations 


a= not ote 
a. q) p ey a 
2 n=2 
Ei = dP + ry ; 
pice a da-1 P asi 
“i=l 7 dn : 


where deg q; < degp forall i. Since these equations give 


ane wd ie i Oe 
“ae we Fett a “ 
we have 
Bt vce In-1 |, Sn 
Ht Hp rr + 
n p 2 mall n 
Pp Pp Ie Pp 


Since deg qi < deg p, the rational functions q,/ p are primary, so we have the 


desired expression for f/g if we write each term in (4) in this form. 


I]. We now prove uniqueness. Suppose that, in addition to the expression derived 


above 


259 


m are 


|| | ees ces (6) 


t= \ = 1 p,| 


TQ fen 


for the proper rational function f/g asa sum of primary rational functions, we had 


another such expression 


: : £ : & 
perhaps involving terms Dig /%, whose denominator qy does not occur in (6). But if 


we add terms with zero values for | and Die to the two expressions for f/g, we 


May assume that the denominators are the same in the two expressions. Then, subtracting 


them, we obtain 
e 
mi ao Be 
>», aa =@® . (7) 


P= \ iia Py 


Here we have M instead of m_ because of the possible addition of terms in which some 


of the q, are taken for the Ps (namely, when i > m). The N, are chosen so that 


an = |p). # 0 . (8) 


Multiplying (7) by Nn P, : , we obtain the polynomial identity 


The exact nature of the polynomial u does not concern us. The important thing is that 


M N 


this equality implies that Py divides , N ~ by s, Nn Py But 
aul PNY hee 


M WN, 
i 
= - » lal 
aca [ ip Pros 7 1, so that we must.have elf, N, Pi a owever, 


- | << a Jal 
vee(*s wy, Die “4 < max ie a N, , deg 7S deg Py ence 


260 
a - b = 0, which contradicts (8). Oo 
1 
The proof of Theorem 3 is completely constructive once we know the factorization 
(4), and can be used to actually write a proper rational function as a sum of primary 
rational functions. 


Note thatif g = (X-c)" h(X) and h(c) # 0, then for any by 


by f-b h 
= +] S 


Ee (ee | wee) an 


aq [en 


Setting b, = f(c)/h(c), we obtain f(c) - b) h(c) = 0, andhence f - byh = (X-c)f, 
for some fy (see §1 and also Chapter 6). Thus, 


f-bh i, 


(6h Oe ern 
Applying the same procedure to this rational function, we again lower the power of X-c 
in the denominator by one, and soon. After n steps we arrive at the expansion 


f f 0 Ses 


i? bi eK ¢ 
Ox 2 ey Ja tell (x =e) 


If h (and soalso g = (X - c)"h) decomposes completely into linear factors, 
then, if we go through this procedure for each factor (X - oJ in g, we eventually 
obtain the expansion for f/g in the theorem. 

If we know that all of the irreducible factors Ps in (4) are linear or quadratic 
(this will be the case if K = IR), thenthe primary rational functions will have the form 


d Glee ap @ 
= or ——S = 2 An lis CpG, 8 © IX 


’ (9) 
(X =)" (x? Pa 


and it is also convenient to use the so-called method of undetermined coefficients. One first 
writes f/g asa sum of fractions of the form (9), then multiplies both sides by g, 
gathers together terms with the same power of X , and equates coefficients of each power 


of X on both sides of the equation. In the next chapter we shall see that this procedure 


261 


always works if K_ is the field of real or complex numbers, And it is in R(X) and 
@(X) that we often want to decompose into primary rational functions, because this is a 
necessary step whenever we integrate rational functions. This technique is also called 


"partial fraction decomposition”. 


EXERCISES 


1. Construct the field of fractions MR((X)) of the ring MR[[X]]_ of formal 
power series in X with rea) coefficients. Using Exercise 6 of §3, show that every 
element of the field IR((X)) has the form of a so-called meromorphic power series 


-m -m+1 =I 
co(X) = aims + a _m+1 fieee t ax + ay 
e 


Z 
+a)X + a,x qr OD 5 aeR, 


in which we allow a finite number of terms with negative exponents. In other words, we can 


write »(X) = ye f(X), where f(X) is an ordinary power series in R[[{XJ]. 


2. Let iR(X,Y) (respectively, IR((X,Y))) denote the field of fractions of the 
polynomial ring IR[X,Y] (respectively, of the integral domain R{[[{X,Y]]; see 


Exercise 7 of §2). Show that 
R(X, Y) = (R(X))(Y) = (R[X])(Y)_.. 
Are the fields IR((X,Y)) and (R((X)))((Y)) isomorphic? 


3. Suppose that the infinite sequence of real numbers ag, a is periodic 


Lrageeee 


2 


from some point on. Show that the power series f(X) = ay + ay xX + a,x + +++ canbe 


written as a rational function in IR(X). 

4. Let R be a commutative ring with unit 1 (not necessarily an integral 
domain), let M be a submonoid of the multinlicative monoid of R, and let 5S = 

, 2 ; ; 
RXM, Prove that the following binary relation IT € 5 is an equivalence relation 
2 
on S: T = {((a,b),(c,d)) «8S | (ad-be)u=9 for some ueM}. 
Cc 


5. Copying the proof of Theorem 1, show that the quotient set S/T, where 


T is the equivalence relation in Problem 4, has the structure of a commutative ring 


262 


with unit. This ring Q, 68) is called the ring of fractions of R relative to M. 


If R is an integral domain and M = R*, then we obtain the usual field of fractions 


QR). 


Chapter 6. Roots of Polynomials 


We now take up what used to be the raison d'etre of algebra, the roots of polynomials. 
This subject has ceased to dominate algebra, but its importance is indisputable. After all, 
many problems in mathematics ultimately boil down to the determination of the roots of 
certain specific polynomials or some information about the set of roots. We shall only be 
able to discuss the simplest properties of roots, but those properties will be enough to 


convey the full importance of the special place occupied by the field @ of complex numbers, 
§1. General properties of roots 


1. Roots and linear factors. Let A be a commutative ring with unit that is 


contained in an integral domain R. 


Definition, Anelement ce R_ is called a root (or a zero) of the polynomial 
fe A[X] if f(c) = 0. Wealsosaythat c isa root of the equation f(x) = 0. 
It is clear why we have to consider roots lying ina larger ring R thanthe ring A 


in which we found all of the coefficients of f£. Namely, remember the polynomial 


264 


£(X) = x? + 1¢ R[{X], which does not have any roots in JR but has thetwo roots i 
and -i in @. 

But we first consider the case R= A. 

THEOREM 1 (Bezout's Theorem). Anelement ce A isa rootof fe A[X] sui 


and only if X-c divides f in the ring A[X]. 


Proof. This theorem follows from a more general assertion that we could have 
proved much earlier. Namely, the division algorithm (Theorem 5 of §2 Ch. 5) say that 
£(X) = (X-c)q(X) + r(X), where deg r(X) < deg(X-c)= 1. Hence, r(X) isa 
constant. Substituting c inplace of X (i.e., applying the map Be in Theorem 2 of 


82 Ch. 5) gives f(c) =r, so that we always have 
f(X) = (X-c)q(X) + f(c) . (1) 
In particular, f(c) = 0 # f£(X) = (X-c)q(X). Oo 
Dividing a polynomial f(X) with coefficients in an integral domain A_ by a linear 


polynomial X-c_ is best done by the so-called Horner method, (also known as ‘synthetic 


division”) which is simpler than the division algorithm. Namely, let 


Sl 
FCO ae aera eX ee eee aes 
0 1 n i 
According to formula (1), 
Deel ae 
Gyoe bre = echt Diy anaeces 
0 1 ial j 


Comparing the coefficients of each power of X_ on both sides of formula (1) (beginning 


with the highest powers), after a little rearranging we obtain 


so that we have also computed the value of f at X= c. These formulas are convenient to 


265 


use in computations. 


In view of Theorem ! it is natural to introduce a more general 


Definition, Anelement ce A iscalleda k-fold root (or k-fold zero) of 
, Pa Bice k : rary k+1 ; 

fe A[X] if f is divisible by (X-c) but is not divisible by (X-c) . k= is called 
the multiplicity of the root. A 1-fold root is called a simple root, and 2- and 3-fold 
roots are called double and triple roots, respectively. 

Thus, ceA_ isa root of multiplicity k of fe A[X] if and only if 

k 

f(X) = (X-c) g(X), where g.c.d. (X-c, g(X)) = 1. By formula (1), the latter condition 


can also be expressed as: g(c) # 0. Note that, by Theorem 1 of §2 Ch. 5 wehave 


degf=k+degg, sothat k < deg f. We now prove the important 


THEOREM 2, Let A bean integral domain, let f #0 bea polynomial in 


td 
A[X], andlet c +++, ¢ € A be roots of f of multiplicities k,,...,k_, 


ky iY Tg 
respectively. Then 


aa Re 
(OO) = (X-c¢)) nae (X-c ) g(X) , 


GOO) EAI), g(c,) 20, 2S Usnooy Be 
In particular, the number of rootsofapolynomial f ¢ A[X] counting multiplicity 
does not exceed the degree of the polynomial, i.e., 


sap eR t 3 
ye ae +k < deg (3) 


Proof. Theorem 2 can be proved by considering f as a polynomial over the field 
of fractions Q(A) and then using the uniqueness of the factorization of f in the ring 
Q(A)[X], i.e., the uniqueness of the prime factors X- C)y+++5 X-¢_. But there is no 


real need to use all these general facts (the results of 83 and 4 of Ch. 5); we shall give a 


direct argument. 


k k 
1 : 
We shall prove that f is divisible by (X- c)) teas (OXe= c.) = by induction on rj; 


after that, the inequality (3) will follow because degf = kK dp C0 Ep k. + degg. If 


266 


r = 1, the assertion follows directly from the definition of a multiple root. Suppose that 


we already know that 


a ol 
= (X- - h 
f(X) (X cy) ape eX Cy (X) 
Since we have c_ -c, 2 Wn cu ce #0, and A is an integral domain, it follows 
fe : 
Ky ea 
that oF is not a root of the polynomial (X =e,) a ee Se) o  Ieibhe c. isa 


k 
k fold root of f, i.e., f(X) = (X-c)) . u(X). Hence h(c ) =), Ibée o be an 


s-fold root of h, so that h(X) = Ocaen GOS) > 3s k. . We have 
k is k 


(X-c) * u(K) = £(X) = (Xe) realy: Cea) oie (X-¢)° v(X) 


Using the cancellation law in the integral domain A[X], we come to the conclusion that 


Ss = k. , which completes the induction step. This proves Theorem Zh 0 


If we don't assume that A is an integral domain, Theorem 2 is no longer true. 


For example, let A = Ze and let f£(X) = x? . Then f(0) = £(2) = £(4) = f{(6) = 0, 

so the cubic polynomial f has four roots. In addition, the factorization of f in Z 1X) 
: 3 2 2. 

Pecunia EGUNee =X (0G-4) Ge ee a oe ee 


Theorem 2 implies the following 


COROLLARY. Let A_ bean integral domain, andlet f,g « A[X] be two 
polynomials of degree <n. If f and g take the same value when n+1 different 


elements of A are substituted in place of X, then f=g. 


Proof. Set h=f-g, sothat degh <n. But assumption, hic) Seve = 


Ave lee. ee asrateleast sant all 


- LG nae 
h(c iy) for distinct elements c 


yaaa rales 


roots. But, since degh <n, by (3) this can only happenif h = 0 oy UE Be pr 
Oo 


, 


2. Polynomial functions. The corollary to Theorem 2 allows us to answer the 


question mentioned before (see Subsection 1 of §2 Ch. 5) of the relationship between the 


267 


function theoretic and algebraic points of view on polynomials. Every polynomial f « A[X] 


corresponds to a function 
ie ee) Vae A 


The set of all such functions is a ring “Tol » called the ring of polynomial functions. It is 
a subring of the ring a* = {A-7 A} ofall functions from A to A with point-wise 
addition and multiplication (see Example 3 in Subsection 1 of 84 Ch. 4 andalso 
Theorem 2 of §2 Ch. 5). Polynomial functions in several variables are defined in a 
completely analogous manner. 

We have already mentioned, by way of example, that the polynomial 


2 
xX +Xe Fi{X] gives the zero function. In general, if f(X) « Hed is a polynomial 


P -X) g(X) , then the corresponding function F is the zero function, 


a 


having the form (X 
since x?-x = eo -1) =0 forall x inthe field of p elements. It is only when 
deg f <p-1 that we can say that a polynomial f « FIX] is determined once the 
function i is known. An arbitrary polynomial f ¢ F fx] can be replaced by a uniquely 
determined reduced polynomial f* ofdegree <p-1 bytaking f* to be the remainder 
when f is divided by xP =x. Then, obviously, ¢ = f*. 


But the situation is much simpler for infinite fields. 


THEOREM 3. If A isan integral domain with an infinite number of elements, 


then the map f B® f from a ral is an isomorphism of rings. 


Proof. This follows immediately from the corollary to Theorem 2, since we need 
only check that a # 0 if £40; if degf=n, f #0, then the corollary to 
Theorem 2 says that f can have at most n zerosin A, so that it is impossible to 


have f(a) = 0 forall aceA. Oo 


Because of Theorem 3, whenever we have an infinite field K we identify 
polynomials f(X) over K _ with the corresponding polynomial functions f(x) (where we 


use a small x to emphasize that we are thinking of f asa function), The question 


268 


remains of how, in practice, one reconstructs a polynomial f if one knows several values 
of f(x). 

This so-called “interpolation” problem is stated more precisely as follows. Let 
bo» by» bao 6 bo be n+1 elements of the field K, and let Cor Cyr cee 9 Sy be 
n+1 distinct elements of K. We want to find a polynomial f ¢€ K[X]_ of degree <n 


such that f(c.) = b for i=0,1,...,n. According to the corollary to Theorem 2, 
if a solution to this problem exists, then it is unique. But there is always a solution to the 
problem, given by the Lagrange interpolation formula 

n (ee Cle) (ea 


f{(X) = os b, (c, - Cy) g00 (c; = Cc.) (c, ac 


= 


Vaseets Wa sece) 


eaece (co) 


ara (4) 


By the way, the existence and uniqueness of the required f(X)= a xe ay ves +eoe ta 


can also be seen by considering the linear system 


for the unknown coefficients Agreees ae The determinant of this system is the 
Vandermonde determinant, which is non-zero, so that the solution can be found by Cramer's 


rule, But the formula (4) is more convenient, because it is simple and easily remembered. 


In some situations it is more useful to use the Newton interpolation formula 


#Ox) = Uy + U, (X - Co) tL ooo dt u (X - eg) (K - ¢)) Mic Cs ery, ee) 


where the coefficients ae ee found by successively substituting the values 
x= Cos X= Cyyeees xe— oF . The interpolation formulas (4) and (5) are used in 
practical applications when computing and graphing a function »:R ~ R_ based ona 
table or experimental data. If one somehow knows that the function @ onan interval 1 


of the real number line behaves like a very "smooth" function, perhaps a polynomial function, 


one can find a polynomial function that agrees with © for the known values. ‘The so-called 


269 


“interpolation points" are the (c,, by) for c, 


which fall in the interval I; since ~(c,) = by ; 
the desired polynomial must pass through these 
points. Of course, we have just presented the 
simplest type of interpolation. Entire fields 


of mathematics are devoted to the delicate questions 


which arise in selecting the interpolation points and 


applying the various methods of interpolation. It should also be mentioned that interpolation 
methods have played an important role in the theory of transcendental numbers (for the 
definition of algebraic and transcendental numbers, see $2 Ch. 5), so we can say that 
function theorists, number theorists, and algebraists all have an interest in this subject. 
Finally, we note that whenever we are given a rational function f/g « K(X) in 
lowest terms (see §4 of Ch. 5) and anextension F > K_ having infinitely many elements, 


there is a corresponding function whose domain is the set obtained from F by 


E 
(f/g) 
deleting all of the (finitely many) zeros of g in F andwhose rangeis F. It can be 


shown that distinct f/g always give distinct functions F . But we shall not need 


F _ 
(f/g) 
this assertion. One should be careful not to confuse the two notions of an element of K(X) 

and the corresponding function. For example, the rational function x 1/x is not 


defined at the point x = 0, but it makes no sense to ask whether the element 1/X € K(X) 


is or is not defined. 


3. Differentiation in polynomial rines, If we think of polynomials as functions, we 
naturally come upon the following definition. Let 


n n= 
f(X) = a,X +a, X Sed ae nin ote 


be a polynomial of degree n overafield K. By its derivative, we mean the polynomial 


£(X) = na xX"? + (n-1a, lO eae (6) 


270 


If K = R_ is the field of real numbers, and f is the polynomial function 


corresponding to f, then (6) coincides with the usual definition of the derivative as 


the limit 
igh f(x + = - f(x) ; 
Ax-0 


But in the case of an arbitrary field K there is no sense in speaking of limits of values of 


a function (for example, what would this mean in the case of the finite field Z_7?), so itis 


necessary to use the formal definition (6). 
In this abstract setting we still have the well-known formulas from calculus: 


af' + Bg’, OB ec hae, (7) 


(af + Bg) 
fet to eee (8) 


Hl 


(fg) 
Relation (7) follows immediately from (6) and the definition of the sum of two polynomiats. 


If we use (7) and the definition of the product of two polynomials, we can reduce the 


: £ 
verification of (8) tothe case when f = X and g =X _ , in which case we have; 
k+4,' k+4-1 k-1, 4 k 4-1 Kees k : 
COON een Se ee er) KX oa 


As a generalization of (8), we have the following formula, which is easy to prove by 


induction on k: 


In particular, 
GS) Sok ae (9) 


If we re-write (7) and (8) in terms of the "differentiation operator" ara Eat” 5 


it might occur to us to consider for any ring R (not necessarily a polynomial ring) maps 


9 :R- R_ having the two properties 


271 


A(u+v) = But+av , (7') 


O(uv) = (u)v t+u@u). (8') 


i 


Any such mapfroma ring R to itself is called a differentiation. The set Der(R) of 


such maps is the point of departure for a major branch of mathematics (Lie groups and 


Lie algebras). 


A generalization of (8') is Leibniz’ rule 
m 
m meek -k 
o™ (uy) = s (9 ona vo, (8") 
k=0 
which is proved by inductionon m for m> 1 (since, ifweapply 9 to (8") and use 
; , 1 : : : 
(8') and the equality (oe) + es) S on ), we obtain (8") with m+1 in place of m). 
If R= K[X] and the differentiation 8 satisfies (i f)= xf for 1EK » in 
g 
addition to (7') and (8'), then it is easy to see using these rules that 
ANOS) = SGQDe 


Thus, any such differentiation of the polynomial ring K[X] is determined once we 


know the one polynomial SX. If ®X = 1, then & is the usual differential operator 


a 
dx’ 

4, Multiple factors. The result of differentiating f(X) m times is denoted 
f™M (x) . It is ciear that, if f(X) = ay pee ay set teen $ aos then f) x) = nlay 

(n+1) sperienteae , : ee 
and f (X) = 0. Inaddition, if K is a field of characteristic 0, then 
deg f' = degf-1. But thisisnottrueif K isa field of characteristic p, since, for 
example, 

kp,’ kp-1 
(jee epx = 90 


But even in the general case (allowing fields of characteristic p) some information 
about polynomials can be obtained by looking at their derivatives. If we divide an arbitrary 


polynomial fe K[X] by (X ase , where c_ is an element in some field extension 


272 


F >K, and then write the remainder inthe form (X-c)s+zr, where s,reF, we 
obtain: f = (X “6) & +(X-c)str, where te F(X], s,r¢F. Differentiating gives: 
f' = (X-c)[2t + (X-c)t'] +s. If we substitute X=c, weobtain r=f(c), 


8 = ie) 4 tog 
2 
f(X) = (Xe) t(k) + (A -—c)t (cy tlc) 
We have proved the following assertion. 


THEOREM 4. Let K be any field, and let F beanextensionof K. A 


polynomial feK[X] has ceF fora multiple root if and only if fegher@ se. Oo 


Example 1. In any field of characteristic p, the polynomial X= only has 
simple roots as longas n_ is not divisible by p, since the roots of the derivative 
n-l : n 
nX (namely, 0) can not at the same time be roots or x oil, 
Now suppose that K_ is a field of characteristic zero. (The reader will not 
sacrifice anything ifhetakes K tobe Q,R, or C. ) The monic irreducible poly- 


nomial p,() in the factorization 


. k 
NES WAR ke) he (10) 


is called a k, -fold factor of f (by analogy with k-fold roots). We mentioned earlier 
that, in practice, it is rather difficult to obtain the factorization (10) of a given polynomial. 
We shall briefly describe a method, based on the derivative, which allows us to determine 


whether or not f(X) has multiple factors over a given field K. 

THEOREM 5. Let p(X) bea _ k-fold irreducible factor of a polynomial 
Pe KI X15 where k=l) aud) depp = Pee then p@e sia) (k-1)-fold factor of the 
derivative f'(X). Inparticular, if k= 1, then f' is not divisible by p(X). 

Proof. By assumption, TO) = Bear. BOO), Wee —Eeaah (Oo), SOO) = 1, 


i.e., g(X) is not divisible by p(X). Applying (8) and (9), we obtain: 


£'(x) = p(x)! Ckpt(x) g(X) + p(X) g'(X)] 


273 


It suffices to show that the polynomial in square brackets is not divisible by p(X). If 
p(X) did divide this polynomial, then it would also divide kp'(X) g(X), but this is 
impossible (see the corollaries to Theorems 3 and 4 in §3 Ch. 5), since g(X) is not 


divisible by p(X), and deg kp'(X) < deg p(X). D 


Notice that, in proving Theorem 5, it was essential to mow that p(X) is 


irreducible and that char K = 0. 


CGORO PIA RN sl: If £(X) isa polynomial with coefficients in a field K of 
characteristic zero, the following two conditions are equivalent: 


(i) in some extension FOK, f hasa root c of multiplicity k ; 
(ii) fe) = 0 fon Ol] = k= 1 but fe) 20 . 


To prove the corollary, apply TheoremS k times with K replacedby F and 


Piece) mene placed bya eee— ce Oo 


COROLLARY 2. Ifa polynomial fe K[X] of degree > 1 has factorization 


(10), then the factorization of the greatest common divisor of f and its derivative f' 


is 


kl k,-1 Ke 
faery (iitiaa) ie p, (X) Py (X) Sens p(X) 5 (11) 


(Here we are taking the g.c.d. to bea monic polynomial, as we may always do. ) 


Proof. By Theorem 5, each of the prime divisors P; (X) in the factorization of 
f(X) (see (10)) occurs in the factorization of f'(X) with exponent k, lh alate. 
kvl k,l kl 
£'(X) = p, (X) P, (X) vee p(X) wut) 4 
0 ree 
where g.c.d.(u,p,)=1, 1<i<r (we take p, = 1). Hence, by the divisibility 
H pata ae 


criterion in Subsection 2 of §3 Ch.5, weconclude that g.c.d.(f,f') is given by (11). 
0 


274 


Using (11) for g.c.d. (f, f'), we can get rid of multiple factors in f by 


considering the polynomial 


f(X) 


g(X) = ee cee = p, (x) P(X) nod P(X) 


This polynomial has the same prime divisors as f(X) but with multiplicity one. It is 
important to note that the polynomial g(X) can be found without knowing the factorization 
of f, namely, by using the Euclidean algorithm to find g.c.d. (f, f'). 
: 5 4 3 2 ; 
Example 2. The polynomial FSS a ee and its 
Ao 4 & 2 wen 
derivative {'(X)=5X -12X + 6X + 4X- 3 have greatest common divisor 


x? = 3X" a Bk i = OAS 1° . (To see this, we can use the division algorithm iT @ Xe): 
1 ue eee 2 y 
f{(X) = (5 Xx o a5)! (X) 35 (X Bue ae BK 1) 
o 2 
f'(X) = (5X + 3)(X -3X Lex e ly + @ . 


so xe = 3x" ox 1 isithe  e.c.d.) ot f and) ft 779 ine "squarefree" polynomial 
e(X) = £(X)/(X - 1° = x? -1=(X-1)(X4+1) hasthe two roots +1. Thus, 


Og) = (Ox = 1)" (X+1) has 1 asa 4-fold root and -1 asa simple root. 


5. Vieta's formulas. When discussing linear systems, we noted the beneficial 
effect a good system of notation had on the development of the study of linear equations, in 
particular, leading to the theory of determinants. This was all accomplished by 
mathematicians of the eighteenth and early nineteenth centuries. But much earlier, at a 
time when algebra was still considered to be "solving equations", the perfection of algebraic 
notation by Vieta and Descartes stimulated the development of the theory of polynomials and 
algebraic equations. After studying special types of equations with numerical coefficients, 
which impeded the discovery of general principles, it was a bold step to introduce equations 
with letter coefficients. As often happens, the development of new notation led to new 
results. Descartes discovered revolutionary new applications of algebra to geometry. We 


shall discuss a more modest, but nevertheless important, achievement of his predecessor 


275 


Vieta. 
Suppose that a monic polynomial fe K{X] ofdegree n has n roots 


G ee in the field K or in some extension of K, where we allow the 


Ee ae, 
possibility of multiple roots, i.e., that some of the c, are the same. Then, by 
i 


Theorem 2, we have the factorization 
{(X) = (X- - = 
(X) = (X c,) (X cy) son HOS c) 
We write f(X) inthe usual way in powers of X: 
il n-k 


n - 
f(X) = X She testa X +e ta, 


where we find the coefficients by multiplying together all of the X-c, and combining 
i 
similar terms. We obtain the following expressions for the coefficients a, in terms of 
i 


the roots Cc 8 


ist) 
i 
‘ 
=~ 
ie] 
= 
= 
Q 


(12) 


ise) 
= 
ll 
=~ 
' 
i 
~~ 
ot 
Q 
_ 
(7) 
a 
Q 
a 
~ 


n 
oll oo 
Ce) lee “n 


ist) 
It 


These formulas are called Vieta's formulas. 

If £ were not a monic polynomial, i.e., if its leading coefficient were a, anlar 
then the same formulas (12) would give expressions for the ratios a, / ay : 

Vieta's formulas, which give the explicit relationship between the roots and the 
coefficients of any monic polynomial, have the remarkable property that they do not change 
under any permutation of the roots Cyrsers cas This will lead us to introduce the notion 


of a symmetric function, in much the same way that the example of determinants led us to 


the notion of skew-symmetric functions. According to the definition given in the corollary 


to Theorem 3’ in Subsection 2 of §2 Ch. 5, an element f of the symmetric group sy 


276 


acts on a function ey g00e g x) of n variables by the rule 


Bey ks 


REO eS ee ee 
n (1) 1 ‘ol 


an icif ™f=f forall 7 5 is ample of symmetric 
We say that f issymmetricif "f=f fora € i s an example y 


functions we have for each k=1,...,n the k-th elementary symmetric function s.: 


pe? x) = YS XX ee KH é (13) 
De Se S10 pe ee k 


8, (x 


Using these elementary symmetric functions, we can re-write (12) in the form 
k t 
a. = (-1) S (Cy 5°. mee ke Ig Baccsg DM 9 (12') 


so that the coefficient ay is (up to a sign) equal to the k-th symmetric function 


evaluated at the roots of the polynomial. Note that, by definition, ay € K , although the 


roots c c may lie ina larger field F > K. Here we shall not be concerned 


ae 
with the problem of finding a field F over which f decomposes into linear factors. In 


some situations we will know for various reasons that f splits entirely into linear factors, 


i.e., has n roots inthe field K. 


Example. Consider the polynomial me - 1 over the finite field eo . We know 


=) 
that xP" = 1 forall xe es , ie., all non-zero elements are roots of the polynomial 


-l 
x? - 1. We thus have the factorization 


fo ee 1) (ec ee (14) 


(We are supposing enough familiarity with the field ar so that the reader will not be 
confused if we use the same numbers 1, 2,..., p-1 to denote both ordinary integers 


and elements of LS = 7% / pe, ieee, cosets {ti}, .) By (12') and (14) we obtain 


s,(1,2,-..,p-D = 0 (mod p), ISS Ie Zoaeag fo 2 4 


Serre 5 2 I) & ol Gio i) 


The last congruence, when written in the form 


PATS 


(p- 1)! + 1 = O (mod p) (15) 


is known as Wilson's Theorem. For (15) to hold for an integer p _ is actually equivalent 
to p being a prime number. Namely, we just showed that (15) holds if p is prime. 
Conversely, p = P»P. = (oe) = pt = (p-1)! + 1 # 0 (mod P)) > (p-1)! + 1 # 


#0 (modp). O 


EXERCISES 


1. Is the ring of polynomial functions over _ an integral domain? 


2. Let K bean infinite field, andlet f be a non-zero polynomial in 


KLX, 5 O08 9 a! . Using Theorem 3 and induction on n, prove that there exist 
ys 
Apreeey aE K such that f(ay, bo ¢ a) #0. Thus, we have an isomorphism of 


K[{X,,..., X_] with the ring of polynomial functions in n variables over K. 
1 n 


3. Show that a non-zero polynomial f ¢ ee eS A ecee x] of degree <p in 


each of the variables has the property in Exercise 2, i.e., f(a, Aeres a.) # 0 for some 


Ayrcees 


a a . Prove that any polynomial fe es, oon s ed can be written in 


the form 
n p 
= = “ 
+X) = B (Xp acces XK Xj) + £*(X),---5 X)) ’ 


: n 
ic 


where f* is a reduced polynomial (deg, f <p yi =), 2,9... 0) whose total degree 
i 


is < degf. Conclude from this that the map fef=f* isan epimorphism from the 
ring Z [X), cates x] to the ring of polynomial functions in n variables over Z ; 
Pp 


n 
: - p_ 
with kernel L = a a X,) PSS eee x]. 


4, Prove: Theorem (Chevalley). Let f(X,; BAe x) be a homogeneous 


polynomial over a of degree r <n. Then the equation f(x); Ertied x) = 0 has at 


278 


least one non-trivial solution. 


By means of a slightly modified argument (computing the sum 


> g(x, goo 4) x) in two ways), prove that the total number of solutions to 


the equation f(x - x) = 0 is always divisible by p. 


Te oe 


5. Let f(x x) be a quadratic form with integral coefficients. Chevalley's 


poo? 
Theorem (Exercise 4), stated in the language of congruences, tells us that, if na 3; 


then the congruence 


f(x) y+. ; x) = 0 (mod p) 


Z 
has a non-zero solution. Show that all solutions of the congruence te -2y 20 (mod 5) 


are trivial, and hence we cannot remove the condition r <n in Chevalley's Theorem. 


6. Showthat g.c.d.(f,f‘)=1 if charK=0, f is an irreducible polynomial 


over K, and f' is its derivative. 


7. Prove that f' = 0 = f = const if f(X) isa polynomial over a field of 
characteristic zero, andthat f' = 0 = f(X) is of the form g(x?) forsome g, if 


f(X) is a polynomial over a field of characteristic p > 0. 


8. From Subsection 3 we know that every differentiation of the polynomial ring 


K[X] has the form Ts :ftuf', where ue K[X]. Prove the following assertions: 


(i) The set of polynomials which go to zero under the differentiation is the set of 
constants, which is a subring of K[X]. 
(ii) The product a T, is not, in general, a differentiation, but, if 


char K = p>0O, then a) is a differentiation. 


(iii) The commutator LT» ol = Ti q 2 i i is always a differentiation of the 


form cone where w=uv' -u'v. 


9, If we are dealing with a polynomial ring ae jecoy fj} tm tm weucleioles,, iit 
n 


279 


is natural to introduce partial differentiation operators with respect to each variable: 


i i i i i, -l i 
3 1 k n 1 
so ee ee eX Vieme.: 


(i) Show that the set of "constants" (polynomials which go to zero) for <a is the 
k 


polynomial ring KLX, Rpts rey ak xX Bake 


k-1? *ke1? x) in n-1 variables. 


3 
(ii) Let £(X) 5 e084 x) be a homogeneous polynomial of degree m. Prove 


Euler's identity: 


Conversely, if char K = 0, show that any polynomial satisfying Euler's identity must be 
homogeneous of degree m. ¢ 


10. Show that a polynomial ee + ay re feee $ a, € Z(X] has no linear 


factors if and only if a +B a.) # 0. Here is a complete list of all irreducible 
4 2 3 3 2 
polynomials over Z, gigésree n= 3: X,A EU, A als +X+1,X +X +1. 


Write down all irreducible polynomials over Z. of degree 4 and 5 (thereare 3 


and 6 of them, respectively). 


Pe) 2 
ll. Prove that the polynomial X -X +1 is irreducible over Z. , and hence 
over Q. 


Similarly, use a congruence mod 3. to prove that x° -X-1 is irreducible 


over Q. 


$2. Symmetric polynomials 


1. The ring of symmetric polynomials. Following the definition of a symmetric 


function at the end of the last section, we now introduce the analogous notion in the polynomial 


ring A [X, yo4a0 4 x1 , Where A_ is an integral domain, At first glance, it seems that 


280 


Theorem 3 of §1 (and its generalization to polynomials and functions of several variables), 
which allows us to think of polynomials as a subring of the ring of functions, should make 
it unnecessary to make a new definition. However, in that theorem we had to assume that 
the integral domain A has infinitely many elements; we would like to give a universal 
construction, valid for any A. 

So let us again return to the corollary of Theorem 3’ in Subsection 2 Gr yA Ch, Sy 
and associate to every permutation 7 € a the automorphism TA [ xy Gone ¥ x] = 


> A[X), wee x] which takes a polynomial f € A[X,, sees x4 to the polynomial 


mf defined by: 


LE eee Rape er ane ean 
-" a fe ., a 


The polynomial f iscalledsymmetricif mf=f forall me an . As inthe case of 


functions, we introduce the elementary symmetric polynomials She 


k pete Sy) = SS x, x ofa ha 6 k= il Pease, he Cl) 
Pst ola Se as es ik 


We can see that the Ss, are in fact symmetric either directly or by considering the 


polynomial 


CA S06 ONCE) ce (Sk Nee Vs aie Vo ee ae) 


over the ring A(X), G60 ¥ aaa in the new variable Y and observing that the expression 


for f(Y) does not change if we permute the linearfactors Y “xX, ; hence the S. are 


symmetric. 


Note that if we substitute zero in place of x in (2), we obtain 


1 


' z — ail _ igs en a n-l ; 
(Y Xj)e-- (Ge Xj)) = (so ¥ + + (-1) (spor? where (s Jog is 


the polynomial obtained by substituting X= 0 in s,. Canceling Y from both sides 
(which is justified by Theorem 3 of §4 Ch, 4, applied to A{X, Re Ne all) eeeW.e 
n 


arrive at the identity 


281 


Sj =? al 
Oyo ee ke) y" a(S), RTS ene (ak (3) 


Comparing (2) and (3), we conclude that (SJo> fates (s_ vo are the elementary 


svmmetric polynomials inthe n-1 variables X,_,... . 
———e——e—';!”' sc ov ese ed 


~ 


Next note that, since # is an automorphism of the ring A [X,, Hane x) , it 
follows that any linear combination of symmetric polynomials and any product of symmetric 
polynomials are symmetric polynomials. This means that the set of all symmetric poly- 
nomials is a subring of A (X,, oon sg ae . Qur next goal is to discover the structure of 


this subring. 


2. The fundamental theorem on symmetric polynomials. It turns out that the 


most general way of obtaining symmetric polynomials is as follows. Take any polynomial 


ge ALS oe cel and substitfite s sy in place of Y,,-++5 oe respectively. 


ee? 


The resulting polynomial 
eeaseg x.) = B(S) (Xp yee 5 ok) 5 aoe 4 8 (Xypoeee ; x) 


is, of course, symmetric. We shall prove that any symmetric polynomial f can be 
obtained in this way. 
4 i 
Notice that a monomial Y, do6 a in g becomes a homogeneous polynomial 


of degree i, + 2i, + 3i cree in the variables X 


1 2 3 x after substituting 


ae 
Y,= 8. (X) g000 9 x) , since s. is homogeneous of degree k. une sum — 

i i 
i, +2i, +++ + ni, is usually called the weight of the monomial Y_ ... cee By the 
weight of the polynomial gy, y eca 3 ) , we mean the maximum weight of a monomial 


occurring in g. 


The fundamental fact about symmetric polynomials is 


THEOREM 1. Let f € A[X, Ae x4 be a symmetric polynomial of total degree 
m over an integral domain A. Then there exists a unique polynomial g € ALY); noon ed 


such that 


282 


£(X -»X) = B(S),+++, 8) 


re’ 
The polynomial g has weight m. 


The proof consists of two parts. 


I. Proof that sucha polynomial g exists. We use induction on both n and m 
(see §7 of Ch. 1). If n=1, the theorem is obvious, since 8, = xX, , and 
£(X)) = f(s)) . Suppose that we knowthat g exists for polynomials of <n- 1 variables. 
We must prove the assertion for polynomials of degree m in n variables. We now use 
induction on m. There is nothing to prove if m = 0, so we suppose that m > ® . 
and that the existence of g has been established for any polynomial of degree < m. 


Now let £(X) aos ¢ xX) be our given polynomial of total degree m. If we set 


x = 0 


. by the induction assumption we can write 


$) 


EOC ere rane) f= BUS grees (siipo » 


where g, is some polynomial in IG eee Ya-12 of weight < m (the weight could 
be < m, since the degree of f might decrease after substituting 2 = 0), and 


(So Roan 4 (s__po are the elementary polynomials in the variables xX, erent a Bek 
(see (3)). Obviously, deg 8, (8) (X, BOG p Xs Bebe sy 1 sass xX) = me. Hence, 


the polynomial 
f 1 (X 


Vere ee oe A) BS eres ) (4) 


has total degree in xX, ORS G xX no greater than m, and, since it is the difference of 


two symmetric polynomials, is symmetric, In addition, £(X, eas sa ~O=e2 0, so 
that X divides f):f, = X)° f°. But, since f, is symmetric, we have 
f= ah = Xia) G@ O), eee S_, Le., f, isdivisible by each X,,...,X,, and 
hence by the product See xy Xy 20 xX . Thus, 

MOR oS ICR honey (5) 


where f. is also a symmetric polynomial, having degree deg f 


2 =degi,- ns m-a., 


2 


283 


By the induction assumption, there exists a polynomial 85 (Y) A ee) Ot weight 
n 
< m-n_ such that £,(X)5 om. A) = &(S)5 660-4 s): Then, by (4) and (5), we 


obtain the following expression for f: 
HOKe sna c = aes s 
(X,> ; x) 8, (S)5 r su-p + sy By (Sy a-0- ‘ s) ; 
and we have found the desired polynomial g = 8 (Ypres o-1 ) + — B» (Ya > Poe a) of 


weight < m. Since degf =m, the weight of g cannot be less than m, and so the 


weight exactly equals m. 


Il. Proof that g is unique. If we had two distinct polynomials B18» such that 
a 8) (s)5 sere sy) = & (S15 ees s)) , then we would have a polynomial 
g(Y,, esas x) = 8° 85 # 0 for which g(s) 5 Axo s,) = 0. In other words, 


Bixee a 5, would be algebraically dependent over A (see the definition in Subsection 2 
ie 


il bd 
of &2 Ch. 5). We use induction on n_ to show that this is impossible. Supposing that 
polynomials g for which g(s), Reon s) = 0 exist, we choose the one of minimal total 


degree. Considering g asa polynomial in Ya over the ring A LY); sre es ad » we 


write it in the form 


g(Y 


k 
; .e) = By(Yyr--- ; cy) teeet B (Ys eee ; ai ey ae k=deg_ 8 


pot 
i ea O, then g=Y hb, where h € ey boo ¥ YJ: By assumption, 
porte s) = 0, so that, since ALX,, ee x] is an integral domain (Theorem 


1' of §2 Ch, 5), this means that h(s, sooo 9 s,) = 0. But this is impossible, since 


deg h(y, go00 4 ue) = deg a(Y,, Boa fp “a -l. Thus, 8) # 0. Now consider the identity 
k 
By(Sypreees s4-) ap O00 4p By (Sy aoe: ; Sue Sn = B(S) ++. 7s.) = 0 


in ALX,, Bo 4 ce , and substitute 0 in place of xX . Then all terms except for the 


first vanish, and we obtain 
By l(S)or-++ 9 Syyp) = 9 > 


where (Spgs seas (so are the elementary symmetric polynomials in the variables 


284 


Ree er (see (3)). But by the induction assumption, the (Sg 5 Relocosg Mod, 
are algebraically independent over A, which contradicts 8 # 0. This contradiction 


completes the proof of uniqueness, and hence of Theorem l. Oo 


Note that the proof of the first part (existence of g) was constructive, and so can 
actually be used to find the polynomial g ina given situation. It is also worth noting that 
the coefficients of g lie inthe subring of A generated by the coefficients of f. For 


example, if A = Z, the coefficients of both f and g_ will be integers. 


COROLLARY. Let f(X)=X" +a, Koc aX +a, bea monic 


polynomial of degree n in one variable X overa field K. Suppose that f(x) has 
no roots Cy,+++ 4 C, in some field F containing K. Let h(X),--+, X)) be any 
symmetric polynomial in KIX), 5005 x] . Then the value h(c, O00 p c,) obtained 
by substituting Cc, in place of x 5 = Wocoog Wy Desig K- 


Proof. By the fundamental theorem on symmetric polynomials, there exists a 


polynomial g(Y,,---, ve) € KLY,,; mee ud such that h(X,,.--, x) = 


tl 


g(s,(X),-5- ‘ Xdaves , SCX ree: “ xX): ences h(c),--- ; ce = 


ul 


B(S, (Cy5 B60 5 Caceres s (ey) o0Og cl); but, by Vieta's formulas (12) in $1, 


k n 
SL(Cyyeees Cy (1) a, € Kee hences H(cy,-++; Cee (21) ajek. oO 


3. The method of undetermined coefficients, There are many different proofs of 
the fundamental theorem on symmetric polynomials, and several different methods for 
finding the expression for a given symmetric polynomial f in terms of the elementary 
symmetric polynomials are available. In order to describe one of the most commonly used 
methods, we introduce a new type of symmetric polynomial. To be definite, let us take the 

i i 
integral domain A tobe Z or R. Let v=X, ... x be a monomial. We shall 
call va monotonic monomial if i, > 1, 00g BS i . Let S(v) denote the sum of all 


distinct monomials in the set of n! monomials ofthe form mv, where w_ runs through 


285 


s . In other words, 
S(v) = on ™v =, 


where the notation #7 € s_/ H means that 7 runs through a set of left coset represen- 
tatives of the subgroup H = {tT « s, lrv=v} in S, (the reader should check that this 


set H_ really is a subgroup). An example of such a sum is the so-called power sum 


k k k k 
PL (Xp aves ; x) = S(X_) = xX, + X, ape ae x ; k>0O . (6) 


In this example we clearly have H=S__,. Another example is S(X, xy eter xX) 
= 8. (X)5 nies x) (what is H inthis case?). It is clear that S(v) is a homogeneous 
symmetric polynomial of the same total degree as v. Since S(v) = S(ov), ¥oe oe ; 
we need only consider S(v) for monotonic monomials v (since any v_ can be trans- 
formed into a monotonic v_ by a suitable GE s It is also clear that any symmetric 


polynomial f over A_ isa linear combination of polynomials of the type S(v) with 


coefficients in A: 


te >» a S(v) ‘ 


It is usually possible to see ata glance how to write f in this form; hence, the problem 


of expressing f in terms of the Sy reduces to the problem of expressing the S(v) in 


terms of the elementary symmetric polynomials. 


Let us agree to arrange the monomials in S(v) in lexicographic ("alphabetical") 


he tt i 
Wen. 2 n 
order, i.e., in such a way that a monomial v = xy Xx, Rises xO comes before a 


Vee j 
monomial w =X) ee x" ("vy is greater than w":v > w) if and only if the 
n 
sequence i tio oot za jour thetorme Oyo725 0, tyes. where 1 > 0 


(there may be negative i, 2 i, to the right of t). Of course, we can also use lexico- 


graphic ordering to arrange the monomial terms in any polynomial fe A [X,; BuO F x] ¢ 


Note that the leading (or first) term when S(v) is arranged in lexicographic orderis v, 


286 


Dal i 
2, ae ; ; 
since v was assumed to be monotonic. If v = x xX, eee x is a monotonic monomial, 
then we can consider the product 
ton. it sal te he ail 
1 2 4 8 n-l non 
= = . Tf 
a Sy Sy ree Sy s.? s 8(X ; x) ; (2) 
in which the leading term is simply 
iL oil Lal stare at i 
i, “2 23 n-l n n 
= oe se6 coo os 
Vv xy (X) X,) a (xX) xX -p (X, =) 


(since the leading term in a product is the product of the leading terms in each factor). It 


hence follows that the leading term in the difference S(v) - g, is less than v. Thus, 


S(v) = g, =) a S(w) 


where n € Z, andthe summation is taken over the set of monotonic monomials 
w<v. ‘The total degree of all of the w is the same as the total degree of v. 
The following method now suggests itself for expressing S(v) in terms of 


elementary symmetric polynomials. Let degv = m. Take all "monotonic" partitions 


ee et en ae ee 2B 


of the positive integer m for which w = xX) Kei ee ve Consider the set M, 
of all such monomials w. Foreach we M, we have the monomial oe (see (7)). We 


already know that 


BO Sa ot Ds My 8 > (8) 


weM 
V 


where ny are certain integers. The undetermined coefficients a (whence the name: 
the method of undetermined coefficients) can be found by successively substituting suitable 
integer values, usually zeros and ones, in place of x, g000% x in (8). The values of 


B98, and S(v) are known, and so we obtain a compatible system of linear equations for 


the unknowns ny é 


287 


3 
Example, v= XP, S(v)= py, y--6) Xs 223, B= 85 


3 
Beer ee Seg 6 


1 2 i ’ 
’ 


s,=3, 8, = 1. From the resulting linear system 


3 
Pet fasts eed , 


SF ee eal 


w 
( 


3 
= 8) 38,8 + 3s 


we find: a=-3 and b=3, ie., 2 3° 


a 
We have the following convenient formulas, called Newton's formulas, which can be 


used to express the power sums P(X)» G00 4 x) as polynomials in A ee 


Cie he acl (9) 


k- 
S, tere + (1) k 


De Pe ee 2 Sates 


-1 
eee eae 


n 
eee eet Sit Pico 3 Bere net + (-1) Decal 0 (10) 


for k >on. inorder to prove these formulas, we make use of the obvious relations 
na n-1 n 
- erane =a} = = 
Xx, sy x; + ap (el) S-1 x; + (-1) Sy Om 


which are obtained by substituting Y = Xx; in (2). We multiply each of these equations 


by ee (k > n): 
- al k- 1 k- 
ees ee ee een) eX = 0 
i a n-l i noi 


and then sum over i from 1 to n. This gives us both formula (10) and also (9) 


288 


when k=n (Pp = e ap O00 op =n). Next, consider the following symmetric 
homogeneous polynomial f of degree k<n (or -@ if f = 0): 
k,n = ei 
ft. & ee m3 ep er) o oeea k 
k,n 1? Leiahs n ce Be Deer l . PF k-1 (~1) ks, . 
Using induction on r=n-k, we prove that fe is identically zero. We just showed 


y) 


this when r= 0. Now set x = 0 and note that the resulting symmetric polynomials 
(so and (Pg coincide with the polynomials s; and P; defined for the n-1 


variables X 5 (see (3) and (6)). We obtain: 


Nes n-1 


k- k 
Beet net io Peep ato to THe Oh H(-DK(s 9 = 


v 


> a fee 


’ 


Neel ae 


eee ge 


since n-1-k=r-1< 4, so that the induction assumption applies. The relation 


0) = 0 shows that the polynomial fe B is divisible by 
? 


= xX f Using the fact that f - is symmetric (see the similar argument in 


git : 
- ean 1 k, 


the first part of the proof of Theorem 1), we find that 
eo We goof x) = SX oe ; x) . B(X)5 boo y xX) 


which is only possible if g=0, since deg as and degf =k<n. Thus, 


k,n 


fi. ae 0, and formula (9) is proved. 


2. 


4, ‘The discriminant of a polynomial, In the ring KIX, »-e., X_J, consider the 
n 


polynomial 


Wee ee x) 


Fee 1 
l<j<i<n 


which can clearly be written as a Vandermonde determinant 


i 1 i 
x 
Nite l X, cases Xx 
n aM ie erga Ee Pe RY CB 3 a enue (11) 
jal - = 
x x2 2 es xn 1 


289 


Since the determinant is a skew-symmetric function of its columns, it follows that 
; : F 2 

m@(A_)= € A , where e_ is the signofthe permutation weS_. Butthen A isa 
n rn til n n 
symmetric polynomial, and, by the fundamental theorem, it can be expressed as a 


polynomial in the elementary symmetric functions: 


2 2 : 
ae = I], - x) = Dis(s) +--+; s)) 


The polynomial Dis in the variables OG Oe a xX)» aoe 4 s(x , x) is called 


ae ae eae 


the discriminant of the n-tuple x joo 1 xX . Its coefficients obviously lie in Z. If 


we substitute x, € Fin place of x pet = ty 2 aes Ot (where is an extension of 
K), we obtain the discriminant of the n-tuple of elements of F. If not all the x € ie 
are distinct, then the discriminant vanishes, since at least one of the factors x7 x, is 
zero. The name “discriminant” comes from the ability of this function to distinguish the 
case when two or more of the x, coincide from the case when they are all distinct. 
i 
A convenient method for obtaining the discriminant is based on interpreting A as 
: : 2 t 

the product of the determinant (11) and the transpose determinant: a = A A (recall 


that det “A = det A for any square matrix A). Using the rule for matrix multiplication, 


we immediately obtain: 


= Pye 5 Phel 
P, Py Pg De 
i = i 
Dis(s,,-++» §,) Po a Pred ; (12) 


Po ee a ee 


where Py are our familiar power sums (6). If we compute the PE using the recursive 
formulas (9) and (10), we obtain an explicit expression for Dis(s, ee s)) . For 


Zs 
example, Pp) = §; and Py = 8) ~ 28, , so that 


Dis(s, 5 8») = 2 = 8) - 458, : (Gls) 


290 


Now suppose that we are given a monic polynomial 


“1 
£(X) = Xo tap xX +e. +a X +a e KIX] , 


c in K orin some extension F of K. We know 


which has n_ roots Chores C, 


ee = pe ns 
from Vieta's formulas that ay = (- 1) S (Cc), es C7 . 


Definition. The discriminant of the n-tuple of roots Chosees cS of a poly- 


nomial f, or equivalently, the value of Dis(s, ga00g sy obtained by substituting 
(- ne ay in place of Shs is called the discriminant of the polynomial f and is denoted 
D(f). It is also called the discriminant of the equation 


n ia 
= 990 = @ ; 14 
f(x) ie +a,x + + ae os (14) 


It is clearthat D(f) « K (recall the corollary to Theorem 1). The following fact is also 


an immediate consequence of the definition of the discriminant: 


PROPOSITION. D(f) = 0 if and only if the equation (14) has multiple roots (i.e. ; 
at least one root of multiplicity k > 1). ey 


Taking into account Corollary 2 of Theorem5 in §1, we now have two methods of 
deciding, without leaving the ground field K, whether or not a polynomial f « K[X] has 
multiple roots. But this is not the only purpose of the discriminant. For example, when 
applied to the quadratic polynomial f(X) = ce +aX +b with real coefficients a and 
b, formula (13) gives D(f) = Be ~4b, which is a familiar expression from elementary 
algebra. In particular, the sign of D(f) determines whether the equation x” or Ebe ae js) = U 
has two real roots or two complex conjugate roots. 


To take another example, let us compute the discriminant of the so-called incomplete 


cubic equation 
3 
f(x) = x + ax +b=0 ‘ (15) 


Here 8, = 0, and, computing Py by the recursive formulas, we obtain De 3) = 0 


291 


2 3 4 a 
Se, °-25, => = a =- = © 
Py Sy 8, 2a, Pa = 8) 38,8, +38, 3b, Py = 8) 4s) Ss, + 48,8, + 
9 2B 2 
= 255 = 2a. Hence, by formula (12), we have 
3 0 -2a 
3 ZB 
DE) = | 6) Wet sei9)|| = ele 7 Ns) 4 (16) 
gl“ Seailo) fee 


D(f) is given by a more complicated expression than (16) if we take the complete cubic 


Q 


5 2 
equation x" ote a,x SF a, X ae a, = 0, but we can avoid the added complication by 
reducing the complete cubic to the incomplete cubic as follows. Whenever we have a poly- 
nomial f inthe form (14), we can make a change of variablefrom x to y=x+ a,/n. 


Substituting x = y-a, /n in (14) and using the binomial formula, we find that the 


polynomial 


a n n-2 
Bye SE eae Ys jee = OD, (17) 


has zero coefficient of ee . If we know a root Yq of (17), we immediately know a 
root Xq = Yq ~ a,/n of the original equation (14). Hence, without loss of generality 
we may always assume that a, = OF 

If we want to find a general formula for the solutions of (15) (this was a major 


achievement of del Ferro, Cardano, and other mathematicians of the Middle Ages), we 


inevitably end up making use of the discriminant (16) (see formulas () ole Gs (Ch, 1D), 


5, The resultant. The basic property of D(f), given in the proposition in the 
previous subsection, can also be interpreted as a criterion for when a polynomial f and 
its derivative f' have a common root (or common factor). In the last analysis, this 
criterion is based on the Euclidean algorithm. This leads one to hope that, more generally, 
we can find a similar criterion which allows us to determine directly from the coefficients 


whether two polynomials f, g « K[X] have a common factor. 


Let 


f(X) 


i 
© 
~ 
j=] 
a 
@ 
= 
ras 
=k 
’ 
ae 
at 
© 
Pas 
ab 


m m-1 
g(X) by X +b x sirens site Dea eect Oe 


be two polynomials with coefficients in K. Here n>0O, m-> 0, and we allow the 
possibility that ay = O or by = 0. 

Definition. The resultant Res(f,g) of f and g is the homogeneous polynomial 
inthe coefficients of f and g (having degree m_ in Agrtres a and degree n in 


b vee bd) given by 


0? 
ay ay - a) 
ag ay vee ay a 
Se Novato Soasaees OER er oe rows 
gL. fl a 
i 
Res(f,g) = : H 
bo by ee bo 
by by bee Dy n 
rows 
bo by Der 


Implicit in this definition is an assertion about the degree in the a. and the 2 
of the above determinant. But this follows immediately from the properties of determinants: 
if we replace as by ta, in the first m rows, then Res(tf, g) = t'” Res (f,g), and 
so the resultant is homogeneous of degree m_ inthe a. by Exercise Jj of $2 Ch, 3; 
we similarly show that it is homogeneous of degree n_ inthe Bi F 


We now derive the basic properties of the resultant. 


Res l. Res(f,g) = 0 if and only if either ay eds by orelse f and g have 


a common factor of degree > 0 in K[X]. 


Proof. We first show that the condition "ag =@Os by orelse f and g have 


a common factor in K[X] of degree > 0" holds if and only if there exist polynomials 


He and 8, not both zero, such that 


293 
fg, +f = 0, degf, <n, degg, < m : (18) 


To see this, let h = g.c.d.(f,g) have degree degh > 0. Then f = hf, 7 eS “hg, 
and hence fe + gt, = 0. Inaddition, deg f, <n and deg £5, — mG 80 that (18) 


HoldSaeclt vas — 0) sb we can set t= a Sees 


Conversely, suppose that (18) holds. lf we had g.c.d. (f,g) = 1, since K[X] 
is a unique factorization domain (see §3 Ch. 5), we would have the implication 


fg.= TE i ciple mhus deg <n “andi) deg ¢ =< mi; ive; ay = 0= b- 


We now prove that (18) is equivalent to Res(f,g) = 0. If we set 


f, = 9X +c, X + eee 
jag m=) 

8, = 495 + d,X gee a a4 
e 


and use the rules for operating with polynomials to compute the coefficients of fg, + gt, ; 


which has degree < n+m-1, we can write the condition (18) in the form of a square 


homogeneous system of linear equations with n+m unknowns do 4 dis Bo0 4 cae ; 
Cor Cpreres Cyt 
ag dg + PHPOG RO OES Gn . + by ep SReOnMOeaaDOORDE = OW" 5 
See Ors Sen Taian ieneee Sal ppae ommetren ete an = ie) 
a, dg + and) + + bycg + boc, ; (19) 


3 0 
iC ees ne pene? Pcp? ya 20. : 


o 9 «¢ © © @ ° 


° 
o 8 © 9 eo @ eco es te cer fel et leu ce: eer te: (10 ie: see fe: (isi <0: ie 


The determinant of the system (19) (more precisely, the determinant of the transpose of the 


matrix of (19)) is exactly Res(f,g). Thus, (19) hasa non-trivial solution if and only if 


Res(f,g) = 0, and any non-Zero solution to (19) gives us a pair of polynomials fs gy 


satisfying (18). 0 


Res 2. Suppose that the polynomials f and g_ split completely into linear 


factors in K[X]: 


294 


f(X) = a,(X-@) ... &-a) , 
g(X) = by (X - By) soo (Gk ~B.) 
Then 
a mon 7 mn 
Res (f,g) = ae M g@)= CN" bo M £8) = ap'by Ml @,-8) 
ae ee i,j 


Proof. It is clear that these formulas, if they are true, must be of a “universal” 
nature, not depending on the particular properties of f and g. This simple 
“philosophical” principle, which we shall not justify or discuss in detail here, allows us to 
restrict ourselves to the "general case" when we suppose that all of the g(,) Books g(a) 
are distinct, and that all of the £(B,) 4 oo8 ¥ f(p ) are distinct. 

Next, since Res(g,f) = Gane Res(f,g) (see the definition), it suffices to verify 
the equality Res(f,g) = an Nn ee) . Todo this we introduce a newvariable Y and 
consider the polynomials f£(X) and g(X)-Y over the field of rational functions K(Y). 
Using the definition of the resultant and replacing bo by Da -Y , we find that 


Res(f, g-Y) = (-1)" eae + +++ + Res(f,g) 


is a polynomial of degree n in Y with leading coefficient (- We ear and with constant 
term Res(f,g). The polynomials f{(X) and g(X) - g(a.) have the root oh in common, 
and so both are divisible by X - e. . Using the property Res 1, we have 
Res (f, g-g(a,)) = 0. 
By Bezout's Theorem, the polynomial Res(f,g-Y) must be divisible by g(a.) eW 5 
1 <i<n. Since we have assumed that all of the g (a) are distinct, it follows that 
n 


m 
Res(f, g-Y) = ay Me (g(a ) -Y). Setting Y=0, we obtain the required equality. 
= J 


We extend the definition of the discriminant in Subsection 4 to the case of non-monic 


polynomials, by setting 


295 


2n-2 = 
Dy ses" n (a Pare au Nn (eaves ey. se 
0 ee es | ep eke 0 
Ss il su 
Res 3. ‘The following formula holds: 
n(n - 1) 
) = 2 -1 ; 
D(f) = (-1) ay Res(f,f')  . (20) 
Proof. According to Res 2, 
inl - 
Res\(ievtae) ae ag Nn £"'@,) 
lea 


But 


Bi) = ae Tl (a) 
i Oe i j 
e 


as we immediately see by substituting X = os in the general expression 


n 
PCS = 2, TS Oe ae, 
i=l j#i J 
n 
which is obtained by differentiating the product f(X) = ay Tl (X-a@,). Thus, 
j= : 
‘i n(n-1) n(n-1) 
= - 2 
Res (f, f°) = ao Te PTET Oso ea) = See en) ea =a) ey 
: ava: lee | 0 0 Are be ena 0 
hai jg j<i 
CO 


Formula (20) gives an explicit formula for the discriminant. 


EXERCISES 


l. Let p be a prime number. Using Newton's formulas (9) and (10), show that 


poole { -l(modp), if m is divisibleby p-1, 
i = 
i=l l O(modp), if m isnot divisibleby p-1 


296 


3 
c, be the complex roots of the polynomial X -X+1. What can 


2, ILE C11 C52 Ca 


js) Ye) 
be said about the extension Qc, + co + Cc, ey, 


3. Apolynomial f£(X,,..., X_) overa field K ofcharacteristic # 2 is 
1 n 
called skew-symmetric if (mf) (xX, Ferrites x) = cs £(X, had6 F x) > Wre Sy , where 
2 is the sign of the permutation #7. Thus, an is an example of a skew-symmetric 
polynomial, Show that every skew-symmetric polynomial f€ K[X,, Goo g x] is of the 


form f= AS *g@, where g is a symmetric polynomial. 


4. Using property Res 2 and the existence of a splitting field for a polynomial (see 


Theorem 2 inthe next section), show that 
Res(fg,h) = Res(f,h) * Res(g,h) 
5. Use Exercise 4 and Res 3 to derive the formula: 
z 
Difg) = D(f) D(g)|Res(f,g) |". 


6. What is Res(f(X), X-a) equal to? 


mi = i) 


7. Show that D(x" +a) = (-1) 3 n” aves : 


jal ay 
8. Let f(X)=X +X" “+-+.++1. Usingthe relation xX” -1 = (X ~ 1)£(X) 
and the preceding exercises, show that 


(n ~ 1) (n - 2) 


DG =] ev A apes 


§3. @ is algebraically closed 


1, Statement of the fundamental theorem. Let K bea field, andlet f bea 
polynomial over K. As noted in Subsection 2 of §1, the behavior of the polynomial 
function f£:K-~ K associatedto f very much depends on the field K. In particular, 


we can conclude that Imf isallof K if deg f > 0 as soon as we know that K 


297 


satisfies the following 


Definition, A field K_ is called algebraically closed if every polynomial in K[X] 
decomposes into a product of linear factors. Equivalently, K is algebraically closed if 


the only polynomials irreducible over K arethose of degree 1 (linear polynomials). 

Note that if every polynomial fe K(X] has at least one rootin K , then K is 
algebraically closed, because then we can write f(X) = (X -a)h(X), where aeK and 
he K[X]. Applying the condition to h » wehave h(X) = (X-b)r(X), where beK 
and reK[X]; continuing this process, we finally obtain f(X) asa product of linear 
factors. Since this holds for any polynomial f, that meansthat K_ is algebraically 
closed. 

It turns out that every field K has an extension K 2K. which is algebraically 
closed (Steinitz' Theorem). But at first glance it seems difficult to comprehend not only the 
construction of K but even what it means to have such a field. So we are very fortunate 


to have at our disposal an excelent and important example of an algebraically closed field. 


It is this fact which is the subject of the so-called '"Fundamental Theorem of Algebra": 


THEOREM 1. ‘The field of complex number € is algebraically closed. 


We re-state this fundamental fact in terms of roots: Any polynomial {(X)e [XxX] 
of degree n> 1 has exactly n complex roots counting multiplicity. 

The pretentious name "Fundamental Theorem" for Theorem 1 goes back to the days 
when solving algebraic equations was the foremost activity of algebra. In modern times 
Theorem 1 is considered to be one of the basic (but not the basic) theorems of algebra. 

The first rigorous proof of this theorem was given by Gauss in 1799, Since that time, 
there have been many different proofs, of varying degrees of "algebraicity". It is always 
necessary in one form or another to use the continuity properties of the fields IR and @ 
(in other words, their topology); there is even a completely non-algebraic and rather short 
proof of the Fundamental Theorem based on a fairly deep fact about analytic functions of a 


complex variable. Below we shall give a proof which is, in spirit, the most algebraic of 


298 


those which are accessible to us at this point. Perhaps the most natural proof to give would 


be one using Galois theory, but we shall have to be satisfied with a different type of algebraic 


proof. 


The non-algebraic part of the proof of Theorem 1 is contained in the following two 


lemmas. 
LEMMA 1. Let 


£(X) = aX" +a, X™ tee ta X+a (1) 


0 


be a polynomial of degree n> 1 with complex coefficients. Then there exists a positive 


number re JR such that for |z | Sr, BEL, Wwe bene 
n 
[|| 22 Te aa ap 990 oF el my ap el | 
Proof. Set Dy SNE |p ones la.) and rey a te If we take 


Iz | > xr > 1, then we obtain lag | > A/(lz| - 1), sothat, by the rules on the moduli of 


complex numbers (see §1 Ch. 5), we have 


leg" = leg llel > Stel”. adie _ 
= A(lzP' +--+ Jz +4) > fay [le[PO +--- 
2 82 seq llleliae la = faz tae eee la_| = Jai te ta zta | - 


COROLLARY. Suppose that the polynomial (1) of degree n> 1 has real 
coefficients. Then the sign of (the real number) f(x) is the same as the sign of the 


tt s we n s 
leading term ay Xx for all xeWR which are sufficiently large in absolute value. (a 


LEMMA 2. A polynomial of odd degree with real coefficients has at least one real 


root, 


299 


Proof. Since n_ is odd, the leadingterm a x" of the polynomial function 


0 
f: R~ R_ takes values with opposite signs for positive and negative xeR. If we take 
x sufficiently large in absolute value, by the corollary to Lemma 1, we know that f(x) 


also has opposite signs for positive and negative x. For example, if a, > 1, then 


0 
f(-r) < O and f(r) > 0 if r isthe real number in the proof of Lemma 1. We know 
from calculus (and it is not hard to prove directly) that the polynomial function f is 
continuous, But a continuous function f has the property that it takes every value between 


f(-r) and f(r) ontheinterval -r <x <r. Inparticular, we have f(c) = 0 for 


some c_ with lc| <r. The same argument applies if ay 0 . ) 


This geometrically and intuitively clear assertion concludes the non-algebraic part of 
our proof. We shall give the next step in a context not directly relatedto @, since it 


involves a construction which is of independent interest. 


2, ‘The splitting field of a polynomial. It often happens that a "glance from the 


side” at a well-known example makes it possible for us to understand it better and come up 
with useful generalizations. Recall the realization of @ as the quotient ring 
IR[X 1 (x7 +1)R[X] (see Theorem 6 in §2 Ch. 5). If we replace IR by an arbitrary 
field K and replace x? +1 by an arbitrary polynomial fe K[X], we obtain the "ring 
of residue classes modulo (f)", or, equivalently, the quotient ring K[X]/(f), where 
(f) = f+ K[X] isthe idealin K[X]. The ideal (f) consists of all polynomials which 
are divisible by f; according to the corollary to Theorem5 of §2 Ch. 5, any ideal in 
K[X] is of this form for some f. Just as we have an analogy between the rings Z and 
K[X], we also have an analogy between the residue rings Z. = Z/(n) and K[X]/(f). 
It is worthwhile for us now to repeat the basic steps in the construction of Zz (see $4 
Ch.4) for K[X]. 

The elements of the quotient ring K[XJ]/(f) are the residue classes g = g+(f), 


each of which can be represented inthe form r+(f), where degr < degf. As inthe 


300 


case of Z, this is proved by the division alporithim: it 9@ = qi |) henge 2 (ty 
=r+qf+(f)=r+(f), since qf (f). Itis easy to see that the elements a, aeK, 
form a subring of K[X ]/ (£) which is isomorphic to the field K. Next, if f is 
reducible over K, i.e., if we can write f= f) f, , where f, e K[X] and 


0 < deg f < deg f, then this means that there are non-trivial zero divisors in K[X]/(), 


ml 
ol 


namely, f, #0, i= 1,2, but tie! f 


— 
i] 


Now suppose that f isan irreducible polynomial. If deg r < degf (Ges4@) , ideein 
g.c.d. (r,f) = 1 and ur+vf=1 for suitable u,ve K[X] (see Theorem 3 of $3 


Ch. 5). In other words, 


et ee ea ot) ey ee (fet) 


and hence 


= -1 


Thus, any element r # O has an inverse u=r in K[X]/(f). This shows that, 


whenever f is an irreducible polynomial, the quotient ring K[X ]/ (£) is a field 
containing a subfield isomorphic to K. 


One element in K[X]/(f) is X. For any a a eK _ wehave 
m 


ke ee 


De a X* = 2 fa, OK + (HS = ee (HS + (OF = yy — eo = Da 


; : k 
Briefly, we can say that, if g(Y) = ay e K[Y], then g(X) = g(X). Of course, 
when we write g(X) , we use the fact that K[X ]/ (f) contains a field isomorphic to K to 


think of the coefficients of g as elements ay in K[X]/(f). Applying this to f, we 


bd 


have 


f(X) = £(X) = f+ (f) = (f) = 0 


, 


i.e. , the element Xe K{X]/(f) is a root of the polynomial f. 


Thus, we have the following two results. 


301 


THEOREM 2, The ring of residue classes (quotient ring) K[X]/(f) is a field if 
and only if f is irreducible over K. oO 


COROLLARY. If f(X) is any irreducible polynomial over K, there exists an 


extension F of the field K inwhich f(X) has at least one root. We can take the field 


Mixi/@) for F. oO 


It is customary to say that the extension FF is obtained by adjoining to K a root 
c ofthe polynomial f; we write F = K(c). Wethenhave f(X) = (X-c)g(X), where 
geF[X]. We now have a real possibility of constructing an extension of K in which f 


splits completely into a product of linear factors. 


Definition. Let K bea field, and let f bea monic polynomial (not necessarily 
irreducible)of degree n over K. Thena field F=>K _ is calleda splitting field of f 
if LO (nn. (Xe ) in F(X] and F= K(c),-0+5 ¢), Ge leh Ti 


obtained from K_ by adjoining the roots Cree a of the polynomial f. 


THEOREM 3. For every monic polynomial fe K{X] of degree n> 1, there 


exists at least one splitting field. 


Proof. The condition that f be monic is not really needed, and is included only for 


convenience. Let 
HOO) = £,() Wane f (Xx) 


be the factorization of f into monic irreducible factors in K[X]. According to the 


corollary to Theorem 2, there exists an extension Ky > K_ containing at least one root of 


f, . Of course, this root CF will also be a root of f. Suppose we have already found an 


extension K Deere D Ky > K over which f has factorization in the form 


HOO) = (0% ae Nee (X > c)) g (X) Bee 8 (X) 


with k (not necessarily distinct) linear factors, where k <n. If we again apply the 


302 


corollary to Theorem 2, this time to the field K. and the monic irreducible polynomial 


K._ in which we can split off another linear factor 


g) € KLUX] , we obtain a field Kad 2K, 


with c ek 


28 Sail k+1 kel 


Continuing in this way, we finally obtain a complete 
decomposition of f into a product of linear factors over some extension K, >K. Either 


KA ora subfield F = K(c) nado g o) in KA will be the desired splitting field for f. 


(We have not ruled out the possibility that F is just K.) o 


The proof of Theorem 3 contains too many choices, so we cannot claim to have 
proved uniqueness of the splitting field of f. Although, in fact, any two splitting fields of 
the same polynomial must be isomorphic, i.e., the splitting field is unique up to isomor- 
phism, the proof of this is somewhat harder. For now, we do not need this uniqueness 


property. 


Examples. 1) The quadratic field Q(/a) is the splitting field of the polynomial 


Ke Sl g 


2) If we adjoin to Z, a root @ ofthe irreducible polynomial ce apakap ll. We 
obtain a field Z (9) = {0,1,@,1+6} having four elements which is isomorphic to the 
field ZX l/ (x? +X+1), and also tothe field GF(4) in Subsection 6 of §4 Ch. 4. 
Notice that se +X+1 = (X-6)(X- 9°) pemincee %., (8) is a splitting field for the 


polynomial x? ap cap Ilo 


F 2 be : 
3) The polynomial X +1 is irreducible not only over IR (over which its 


splitting field is @), but also over some other fields, for example, over a. o SUSE 
2p : 
G = oi thn %. (more precisely, let @ bethe element X+ (x? +1) Zo(XJ in the 
; : 2 
residue class field Z{X ]/ (X" +.1)). Since x +1 = (X - 6) (X +8), it follows that 


; igen are 2 
Z., (8) = fa+b@la,be Z..} is the splitting field of X +1 over Z,. By the way, 


lig b 


|, a,beZ 


Z. (8) is isomorphic to the field of matrices in Exercise 14 of 


3 , 


0 =| 


84 Ch. 4; here is the isomorphism: a+b@ 6 a lo i ap Jo) ie 0 Notice that 


%,(@)* = (A), where A= 14+6, = 8, yao, 1, soled, 


6 i 8 ? Paes 
A =6@, 4 = -1+6, } =1, ive., the multiplicative group of the field Z., (8) is not 


only abelian, but is actually cyclic. 


4) According to the Eisenstein criterion, the polynomial x? - 2 is irreducible 
over Q. Since not all of its roots are real, it follows that Q(z) cannot be the splitting 
field. Jt turns out that the splitting field is oz, €), where ¢ is a primitive cube 


root of 1: 


x cx - 92) a - e 45) cx - 7 3) : 


3. Proof of the Fundamental Theorem. All we need from the preceding subsection 
is Theorem 3. 

According to the remark immediately following the definition of an algebraically 
closed field, we need only prove that any polynomial (1) has at least one complex root. We 
first suppose that all of the coefficients of f are real. Without loss of generality, we may 
assume that a, = 1 and a. #0. Let 


0 


deg f = 27) F 


where Ny is an odd integer. If m=0, we knowby Lemma 2 that f has a root (in 


fact, a real root). Using induction on m, we suppose that the existence of a root has been 
proved for all polynomials with real coefficients whose degree has the form on ny with 
m' < m-1 (there are no restrictions on the odd factor ny). 

We consider the splitting field F over € of the polynomial (x? +1)£(X). Let 


u Ws be the roots of f in F. We consider the following elements of F: 


ye ge 


ey ee ie aie , (2) 


where a_ isa fixed real number. (We should actually write Ya (a), but we shall omit 


the a inthe notation for the sake of brevity.) The number n' of elements of the form 


304 


(2) is equal to 


m m 
& 1 
See eetiueiias oo Ue on as 
ee ae 2 One 
where Ny is an odd number. The polynomial 
1 fe] 
PS) = Il (Mee = x Pb KD ee be FLX) 
a ij 1 n 


Tesi json 


has degree n', and, by definition, its roots are precisely the elements (2). By Vieta's 


formulas (12) in §1, the coefficients b 5 Ono g ba are plus or minus the elementary 


symmetric functions s of the v,,. If we substitute the expression for v.,, 


Bi cod : 
il # aa ij ij 


v ), we obtain the function 


in terms of u, and [| in SL. Wos%ygrctes ac ila 


h 


I 
= 
ee 


clay rrres u) =e S neers, ee + EGS 5 oe) k 


which we claim is also a symmetric function. This is true because, for any permutation 
fe s (s. is the symmetric group on n_ elements) we have 


Pye ea) Gy 2 ena ong) a ono) 


(or v if mi) > m(j)), sothat mw induces a permutation m7 onthe set of 


m(j), m(i) 


elements (2). Since s is symmetric, it does not change when 


ee Tee een ae 


the arguments are permuted; hence, 


h mals S58. (AV. a PV. aye tt = = 
(qv yp yu) SIV) os AV 1 39 sv ) Sea gy net hy(uj,+++,U) : 


We note that hu,» Seas u) is the value at Xx, sth, i= il 


, of a symmetric 
i 


polynomial h(x) SOOO 9 x) having real coefficients that depend onlyon ae R. 
By the basic theorem on symmetric polynomials (Theorem 1 of §2), there exists a 


polynomial 8% )5 dan ¢ Y) with real coefficients such that hy (x) seece ts x) = 


= 8.8, (Kprees Moves 8K 


pert? x): Hence, 


(-1)% = hy(tjy--+ su) = Bi (S)(Uj,---.u), Sonny S(Upreee ,u_)) = Blaser rey) eR 


305 


(recall that the a, are the coefficients of our original polynomial f ¢ IR[X]). 

Thus, the coefficients by of the polynomial f. (X) are realforany aeR. 
Since deg fs =f = gut ny (see (3)), it follows by the induction assumption that fs 
has at least one complex root, which, of course, must be one of the vi . Thus, at least 
one of the a is not only in F but in the (perhaps smaller) field @. If we vary the 
parameter ae, we obtain other polynomials fs (X) with real coefficients, and for 
each such polynomial there is a pair of indices i < j (depending on a) such that the 
element Me =u, a ap a (u, + Be ¢ F isanelement of (@. Since there are only 
finitely many pairs of indices {i,j} and infinitely many real numbers a, there must be 


two distinct real numbers a and a' withthe same {i,j}. We may suppose that this 


pair {i,j} is {1,2} (re-numbering, if necessary, the u -,uU,). Thus, 


ae 


uu, + atu, + uy) =e 4 


(4) 


uyu, + a'(u, tun) =e’, afa , 


are both complex numbers. The system of equations (4) implies that 


also belong to the field @. Hence, uy and u, are the roots of the quadratic equation 


iz 
(X - u,) (X= u,) = OK 7 (uy) tu) X+ uy, uy 


with complex coefficients. By the well-known formula for the roots of a quadratic equation, 


we have 


so that uy and u, are also complex numbers. Thus, we have found a complex root 


(actually, two complex roots) of the polynomial f(X) under the assumption that f has 


real coefficients. 


306 


Now let 


ail 
ee as qp O00 ap Bl IX ap 8 


pa) n-1 n 


0 


be an arbitrary polynomial of degree n with complex coefficients (we may assume that 


ag = 1, but this is unimportant). If we replace each a by its complex conjugate, we 


obtain the polynomial 


F(X) = 4 BW ee Le 


We now consider the polynomial 


o 2n einer 
eh) = 1x) 100 — e, +e, X eae oy 
of degree 2n_ with coefficients 
Se = ep aya, eet) Cie 2a 
i+j=k 
Since conjugation zz is an automorphism of @ oforder 2 (see Theorem 1 of §1 
Ch. 5), we have ey, = a, AS eis and this means that e, « R. Since we have 
i+j=k 


proved that a polynomial with real coefficients has at least one complex root, it follows that 


for some ce 
f(c) *f(c) = e(c) = 0 


This means that either f(c) = 0, in which case the theorem is proved, or else F(c) = 0 


= MW = ell - - 
ils la ag ¢ + a,c ap O00 4p a © + a> 0. Applying complex conjugation to both 
sides of this equation, we obtain a ee sp eae + +a a =—0) i 
q b 0°1 1¢ n-1 0 14, 7 92 bes 


fey, O 


The fact that @ is algebraically closed (and also the fact that splitting fields exist) 


are useful in solving a wide variety of problems. 


Example. Let So (f) be the set of zeros of a polynomial fe €{[X], and let 


307 


S, (f) be the set of ones”, i.e., de 5, () # f(d) = 1. Now suppose that f and g 


are two polynomials in @[X]. We claim that 


Sy) = Sy), 8, () = 5 (@) => £(X) = g(X) 


Since obviously So (f) n Ss) (f)= @, bythe results of §1 it suffices to show that 
Is (Ge) Ss, (f) | >n+i1, where n= degf, since inthatcase f-g will bea poly- 
nomial with more distinct roots than its degree. By Theorem 1, we have 


Vv Ss 


‘ mn ie 
Mk) peut =a Me (x0). eae a | 
i=l : et J arew 


rOg) = ag 


where 
ee ae D+ = Isp us, (| 


According to Theorem5 of 81, we have 
v coal Lu tesa 
ee a(t eae ew IT @dete eee (x —d)) n(x) 
i=l jel J 
sothat (n-yv)+(n-p) = = (s, - 1) +2 (t, -1) < deg f(X)' =n-1. Hence, 


ap fv) S> fabep Ibe 
84, Polynomials with real coefficients 


1. Factorization in IR[X]. It follows from Theorem 1 of §3 that every poly- 


nomial f ofdegree n in @[X] can be written uniquely (except for the order of the 


factors) in the form 
Epa OC Se) (Ae) Bee (X-c¢) 3 
where a#O and c “+, C, are complex numbers. Now let 


Te ’ 


2S) = ven ap ay x qr De ae art x aE * be a monic polynomial with real coefficients 
Ayyeery ay and let c be acomplex root of f, which we write in the form 
n 


308 


c=ut+iv, u,ve R. If ec f R, ice., if v #0, and we apply complex conjugation 
to the equation f(c) = 0, as inthe proof of Theorem 1 of 83, we find that f(c) = 0 


as well, because A, = ais Hence, {f(X) is divisible by the quadratic polynomial 
i 

2 = = 2 2, 2 

exe (X -c) Gc) = X =-(cey ch ce eee) 


2 2 2 2 er 
with negative discriminant D(g) = 4u’ - 4(u +v)=-4v < 0. The condition 
D(g) < 0 is equivalent to the irreducibility of g« R[X] over R. 
Next, suppose that k is the multiplicity of the root c of f(X), andthat £<k 


is the multiplicity of the root ¢. Then f(X) is divisible bythe 4-th power of g(X): 
& 
WG = OOF eS) 


The quotient q(X) of the two polynomials in R[X] is also a polynomial in R[X], 
and if k > £ the complex number c willbea root of q(X) of multiplicity k-4, 
while ¢c will not be a rootof q(X). But we saw that this is impossible. Hence, k= ¢ 
(the case 4 > k is handled similarly). Thus, the complex roots of any polynomial in 
IR[X] occur in conjugate pairs, where conjugate roots have the same multiplicity. Since 


IR[X] isa unique factorization domain, we now have the following theorem. 


THEOREM 1. Any monic polynomial fe WEES) tae degree n factors uniquely 
(except for the order of the factors) into a product of m <n linearfactors X - Ci 
corresponding to the real roots c,,...,¢c_, and (n-m)/2 quadratic factors, which 

a 00) Sa 


are irreducible over IR and correspond to conjugate pairs of complex roots of f. fa) 


Remarks. 1) Any irreducible polynomial in IR[X] is either a linear polynomial 


or else a quadratic polynomial with negative discriminant. 


2) Inthe notation of Theorem 1, we have the relation 


fa S90) 
2 


D(f) = (-1) excol ae 


i.e., the sign of the discriminant is determined by the number of conjugate pairs of roots. 


309 


This equality can be obtained either directly from the definition of the discriminant, or else 


using the formula in Exercise 5 of 82. 


3) All primary rational functions in the field IR(X) are of the form given in (9) 


84 Ch. 5. 


2. ‘The problem of isolating the roots of a polynomial. We shall think of a poly- 


nomial f ¢ R[X] asa real-valued function x f(x) ofthe real variable x, which 
we depict by a graph on the xy-plane. The real roots of the polynomial f(X) correspond 
to the intersection of the graph with the x-axis. 

The first important question that often arises in practice is to find bounds for the 
real roots, i.e., an interval a < x <b which we can determine must contain all of the 
real] roots of a given polynomial f. Actually, from Lemma 1 of $3 we already know that 


if |x| SAY), lag + 1 (where a, is the leading coefficient and 


0 
A = max{ lay | oe S la | }) , then the function f(x) does not vanish (even if we allow x 
to be complex). More exact bounds on the roots are given in Exercises 1-4. 

A more general problem is to localize the roots, i.e., for each real root to find an 
interval containing that root and no other root. The first satisfactory (though cumbersome) 
solution of this problem was given by Sturm in 1829. We shall not give the complete theory, 
which would also include the problem of isolating complex roots in regions of the complex 
plane. The simplification of the general results for various special classes of polynomials 
is a matter of great interest to specialists. We shall not discuss the methods for computing 
a “localized root" to within a given accuracy. Modern computer science has at its disposal 
a large arsenal of techniques to do this, but that subject would take us too far afield. 

Fortunately, in many situations one is satisfied to know a rough picture of the 


location of the roots. Important information is furnished by drawing the graph of the function 


x > f(x), whose values can be computed, say, at integer values of x. Notice that the 


310 


roots of the equation f(x) = 0 will occur between 
extremal points (or at those points), i.e. , no two 
roots can occur between adjacent extremal points. 
These extremal points, in turn, are the roots of 
the lower degree polynomial f'(X). Looking at 


the graph can give us an estimate of the number of 


roots in a given interval -- but only an estimate, 
since we might have neglected oscillations of the function x ® f(x) in certain small 
intervals (see the diagram). 

It is a remarkable fact that upper estimates for the number of positive (or negative) 
roots can be obtained using a very simple observation, which Descartes made in 1637, We 


introduce the following 


Definition. Let 


4924, Ay veers a ey Se) (1) 
1 2 q 
be all of the non-zero coefficients of a polynomial f(X) = ay xe + ee t+e+ta e R[X lle 
n 
written in the indicated order. If 4 < 0, we say that there is a change of sign at 
k k+l 


the (k+1)-st term. Welet L(f) denote the total number of changes of sign in the 
sequence (1). 
It is clear that we always have 0 < L(f) < degf, andalso L(-f) = L(f). 
nei 


k 1 
Further note that L(f) = L(aX + a, X + +++), where the exponent k need only 


1 
satisfy the condition k > n-i, , and where aay > O. If L(f)=0, then f obviously 
does not have any positive roots. But it is possible for f mot to have any positive roots 


2 
even when JI.(f) = deg f, for example: f(X) = X°-X+1. But nevertheless, we shall see 


that L(f) does have a direct relationship to the number of positive roots of the polynomial 


311 


LEMMA. If c>0O, then L((X-c)f) = L(f)+ 1+ 2s, where Se 20, On 


Proof. Weare assuming, of course, that f #0, sothat L(f) makes sense. 
if degf=0, then L(f) = 0, and the lemma holds with s = 0, Using induction on 
deg f , we suppose that the lemma holds for all polynomials of degree <n. Let 


deg f=n, and write 


Ee ee oe Merete, BLT cern 
0 k n 


where a, is the first non-zero coefficient after a, ifthereisany (k > 1). Since 


k 0 
L(-f) = L(f), without loss of generality we may assume that ag > O. Set 
i eee 
ce) = a, X + + ai.y% + a. 
We clearly have 
EG) = Lig te. (2) 


where 


If g = 0, then the lemma holds trivially for f, so we suppose that g #0. For later 
use, we also set 
1-k 
(X= ¢) g(X) = ay eee Wee 


(note that if g #40, then h # 0). 


Using the induction assumption and (2), we have 
L(X -c)g(X)) = Li) +1+2t = L@+l-er 2t (3) 
We also have 
COG= Cite = ay xX" (x —6)  (Xe= C)ig = a9 ee & ance + ay es + h(X) 


If k > 1, then obviously L(X -c)f)=2-¢+L((X-c)g), since c>O0 (2-€ is 


312 


the number of changes of sign in the sequence ays “age, ay) . Using (3), we obtain 
L((X -c)f) = L(f) + 1+ 2s, where s =t+1l-¢€ > 0 
It remains to consider the case k= 1: 


(X-c)f = a,X + (a - a,c) X" + A(X) 


If ay and a, 7 age have the same sign, then 


L(@, - age) hy =] LA =e )g) 


and 


" 
ct 


I(OS Srey) SSE IL(CK Sein) UGH) ae IL es s 


If a, and a, 7 age have opposite signs, which can only happen if ay 20 sande sOme 


then 
Lia, ~ ag°) x ns) = (Xe c)e) tt) eee 

and 
Wi ena ear ages HC sels a Qe. . 


where s=t or t+1. Finally, if a) 7 age = 0, which also can only happen if 


a. > 0 and ¢=0, then 


1 3 


L((X = c)f) = Lag X"** +h(X)) = La, X" +h(X)) = LU(X-e)g)= L(+ 142s, s=t 


Using this lemma, it is easy to prove Descartes’ rule of Signs. 


THEOREM 2. The number of positive roots of a polynomial fe R(X] either is 
equal to L(f) oris less than L/(f) by an even number. 


Proof. Let eS be the positive roots (not necessarily distinct) of the 


Cir loreeey 


n 
polynomial f(X) =a,X +--+ +a x » where we assume that a, > 0 and a 
0 n-p 0 n-v 


is the last non-zero coefficient. Recalling how f factorsin IR{X] (Theorem 1), we may 


313 


write: 
HUD) = Cle) ie hee ea (4) 


where ECO a,x) ga ee bx | ay > 0, b> OW > 0). Since ay and b 


have the same sign, it follows that L(g) = 2t is an even number. Using the lemma and 


the factorization (4), we obtain the chain of equalities 
L((X - c))g) = 1l+ 2(s, Sea) ae 


L((X - c,) (K - ¢))g) = 1+ 2(s,+t)+1+ 2s, = 2+ 2(s, +s, + t) ; 


IG)) S wm ab OG. + 6, 4b ooo <b sb te) 
1 2 m 


The last equality gives us the theorem. fe 


Thus, we always have m < L(f). We now consider a special case which is 
important in practice. Suppose we know in advance that all of the roots of f are real. 


‘Then we have a more precise fact. 


THEOREM 3. If all ofthe roots of f are real, andif m(f) = m denotes the 
number of positive roots, counting multiplicity, then m/(f) = L(f). 

Proof. It is fairly easy to derive Theorem 3 from Theorem 2, but there is a 
simple and at the same time instructive independent proof which we shall give instead. 

By Rolle’s Theorem (or the Mean Value Theorem) of calculus, if a' <b’ are two 
roots of our polynomial f(X), then there exists anumber ce R, a' <c <b’, such 


that f'(c) = 0. This implies that all of the roots of the derivative f'(X) are also real, 


and that m(f') = m(f) or m(f)- 1. To see this, let Cc) < Cg EAS Se be the 


roots of f, andlet n n.. be their respective multiplicities, so that 


pr Bgrttte 


n, tn, ee TS degf =n. By Theorem5 of §1, the derivative f' has the roots 


Cyr lor eee ss CL with multiplicities n)-1, n,-1,...,n - 1; and, by Rolle's 


314 


Theorem, in each interval between successive c. there is at least one root of f', so 


we also obtain roots chs Cos 600 9 er of f'. In all this gives us (n, = Iba (n, = 1M) ap 


ap OOt ae (n -l)+r-l=n-1 realrootsof f'. Since degf' =n-1, this {s all of its 
roots. Further suppose that Cy oe while pig anes the positive roots of f; 


thus, n, force t DS m(f). The positive roots of f'(X) are the roots 


ree and 


@ C with multiplicities n,-1,...,n_-1l, the roots Chaeee ae 


4 ie 


Tp ee) 


perhaps c, i.e., the number of positive roots of f' is m(f') = m(f)- 1 or m(f), 


-1’? 


as claimed. The following formula clearly holds in both cases m(f') = m(f) and 
m(f') = m(f)- 1: 


mA am 


m(f) = m(f') + ¢, es (1 -(-1) (5) 


We further note that, if 


HOS AG seta tee (6) 


where ane is the last non-zero coefficient, and if we write f inthe form (4), then 


m 
eevee a) Ci cy scabs where c. > 0 and b> OQ. In other words, 


m (f) F 
Dey 


(-1) S10 | (7) 


We now use induction on n= deg f to prove the theorem. Suppose that Theorem 3 


holds for all polynomials of degree <n. If »y > O in (6), ie., if a =O, then 
n 


TiO) = 2K c £, (X) » where m(f) = m (f,) = L(f)) = L(f), since m(f)) = L(f,) by the 


induction assumption. So we may suppose that a #0. Let 
n 


' ia =| pl 
£'(X) = may X" ONS nee ‘ Teta 
Then 
iL an oe 
16G)) = JUG) se @ co aa Onur ; 
ri ay) 
But we know (see (7)) that (- i >0O and (- pee me > 0. Hence 


a 
n n-u 


315 


m (f)+ m(f') 


i 
(3) 70 = (=) ); thus, 6=6¢. Since L(f') = m(f') by the induction 


assumption, we conclude that L(f) = m(f') + ¢ = m(f) by (5). Oo 
COROLLARY. Suppose that all of the roots of f are real. ‘Then the number of 


roots in the interval (a,b] is equal to Lif.) = Lif) , where 


(k) 
i (00) Sas oD aw aie 
a O<k<n 
(k) 
2 eee eeu ae a) * 
b ki ’ 
O<k<n 


(using the Taylor series for f(X+ a), see Exercise 3 below). 


Proof. By definition, m (fe) is the number of positive roots of os » Which is 
equal to the number of roots of f which are greater than a. Similarly for m(f) : 
Thus, the number of roots of f between a and b_ is equal to the difference 


m (f,) - m (f,) , Which equals L(f,) = L(f) by Theorem 3. fe 


-] 
3. Stable polynomials. A monic polynomial f(X) = se + a,x" Si an-1* + a 


with real coefficients is called stable if all of its 


roots lie in the left half-plane: 
fa) = 0, A=sarip—a<d 


(see Fig. 18). The terminology originates from the 


theory of differential equations, where one has the 


Fig. 18 


following criterion for a physical system (in the broad 
sense of a mechanical, technological, or economic system) to be asymptotically stable ina 
neighborhood of an equilibrium position. If f is the polynomial associated toa given n-th 


order linear differential equation with constant coefficients, then for any root } we must 


316 


have 


iim e° = @ , (8) 
toto 


At at ift 
Since, by Euler's formula (see (15) §1 Ch. 5), wehave e =e e B = 


= e* (cos Bt + isin Bt), it follows that the dominating term is et , and the condition 
(8) is equivalent to: @ <0, 

This leads to a special type of localization problem, called the Hurwitz-Routh 
problem, which asks how to determine directly from the coefficients whether or not a poly- 
nomial is stable. (Actually, this problem was first stated much earlier, in 1868, by the 
British physicist Maxwell, and was solved for certain small n by the Russian engineer 
Vyshnegradskii, who studied the stability problem for regulators in 1876.) The algebraic 


problem was solved for any n in 1895. The Hurwitz-Routh criterion Says; a polynomial 


f is stable if and only if the following inequalities hold: 


eels By ee Oe (9) 
where 
la, 1 0 0 0 0 2 
ax ay ay 1 0 0 0 
7 a. ay a, a, ay 1 5 
k 
ao a6 ac ay ag ay 0 
¥ ca . ° | 
} 
fek- 1 Ded 2k-g oped oes pee 


(we take alas OM for 9s) = on). 

Without attempting to prove the Hurwitz-Routh theorem (such a proof belongs in other 
courses), we take note of the fact that the statement of the criterion has such an elegant form 
thanks to the theory of determinants. We also note that, by Theorem 1, if the conditions in 
(9) are fulfilled, then the polynomial f(X) is a product of factors of the form X + u 


2 : 
and X +vX+w with u>0O, v>0O,w> 0, and this means that all of the 


317 


coefficients of a stable polynomial are positive: 


a >0O 3 (10) 


Thus, the conditions (10) are necessary in order for f(X) tobe stable. Although (10) 
is not a sufficient condition for stability (i.e. , unstable polynomials exist with all positive 
coefficients), nevertheless the use of (10) allows us to cut in half the number of 

determinant inequalities in (9). This is of great practical value, since the computation of 


determinants is a very cumbersome affair. 


Example. If n= 2, the system of inequalities [, > 0, ry > 0 is equivalent 


J 
to the simpler system of inequalities: ay > 0 and ay > 0; this criterion can also be 
seen immediately from the formula for the roots of a quadratic equation. 

If n= 3, the criterion reduces to the inequalities a, > O, a, 2, a, = oO, 
and aa = ag ; since we have: TP. = a. (a, a, - ag) : 

In conclusion, we note that the Hurwitz-Routh criterion does not answer all questions 
connected with stability, since in practice we are often interested in polynomials and 
differential equations whose coefficients depend on a parameter. In that case, we want to 


express the stability conditions in terms of the parameter, and this is a problem of a very 


different sort. 
EXERCISES 


i, tue HO) = ay x + ay oe ficee + as be a polynomial of degree n_ with 
real coefficients. Show that knowing an upper bound for the positive roots of the polynomials 
OS) ve f£(1/X) , f(-X), and x" f(-1/X) gives both upper and lower bounds for both 


positive and negative roots of {(X). 


2. Inthe notation of Exercise 1, let ag > 0, andlet m_ be the lowest index 


for which a <0. Let B be the maximum of the absolute values of the negative 
m 


318 


coefficients. Show that 


G <= il + B/ay 


for every positive real root of f£(X). 


Sh (Taylor's formula), Let K bea field of characteristic zero, and let aeK. 


Prove that any polynomial f¢ K[X] of degree n_ satisfies the formula 


f"' (a) 


(n) 
(K-a) + 2 (Kay + + 


f'(a) = 
at om (X = a) 


£(X) = f(a) + 4 


4. Show that, if f(X)e« IR{X] hasdegree n and positive leading coefficient 


ay» and if f(a) > 0, TG) 20,5, f= 6, then f(c)=0, c>O2cc<a. 


5. Using Descartes’ rule of signs, find the sign of the discriminant of the poly- 


monials x 5 xe +1 and x3 -6X-9 (see the remark at the end of Subsection I, 


. 3 
6. Can the polynomials re ~X-1 and X +aX+beQ[X] have any complex 
roots in common? Recall that the polynomial se -X-1 is irreducible over @ (see 


Exercise 11 of §1). 


7. Show that the roots of a polynomial f(X) = se + hee + ve +weR[X] with 


w # 0 cannot all be real. 


8. It is clear that, if a polynomial fl (8X0) ay ee tieee + aie Z(X] hasa root 


ce Z, then c_ divides the constant term a = f(0). Namely, if f(c) = 0, then 
i yy ’ 


nS came 
a3 ee ag Cc ACER ce a Show that c-1 divides f(1)= 5 ais and 


Piatemers Wecivides! 91 (- yeni) a5 eae 
al. 


9. Show that 


n n-1 
iQ) = OK +a, x ret aeregmal x Vi; f(c)} = 0, ceQ=cekZ . 


10. Show that any polynomial £(X) with f(x) > 0 forall xeéIR_ canbe written 


in the form 


319 


Hes = abd) eam 
where g,he R(X]. 


ll. Give an independent proof of the stability criterion for n=3,4. For n= 4 


4 f 2 
write it in the form: a, > 0,4. 7 By, 2; aay 7 ag» a,(a)ay ~ a4) > a) a 


ye 


"In recent times, the view has become more 
and more prevalent that many branches of 
mathematics are nothing but the theory of 
invariants of special groups". 


Sophus Lie, 1893 


Part Two 
Groups, Rings, Modules 


The second part of the book can be viewed as a more sophisticated, but, one hopes, 
not too abstract continuation of Part I, Relatively few new concepts are introduced. Our old 
friends from Chapter 4 reappear, and lead us into areas of much greater depth. The 
reader should pay the closest attention to the examples, which take up at least a quarter of 
the text (for example, §1 of Ch. 7 and $3 of Ch. 8), Among other things, the examples 
are chosen in such a way as to provide a bridge between algebra and other branches of 
mathematics. If they serve to strengthen the reader's feeling for the unity of mathematics, 


then the author's purpose in Part II can be considered fulfilled. 


I), 


20. 


321 


Further Reading 
M. F. Atiyah and I. G. Macdonald, Introduction to Commutative Algebra, Addison- 
Wesley, 1969. 
T. C. Bartee and G. Birkhoff, Modern Applied Algebra, McGraw-Hill, 1970, 
Z. I. Borevich and L R. Shafarevich, Number Theory, Academic Press, 1966. 
N. Bourbaki, Algebra (Modules, Rings, Forms). 


j. B. Carrell and J. A. Dieudonné, Invariant Theory Old and New, Academic Press, 
UTA 


P. M. Cohn, Universal Algebra, Harper and Row, 1965. 

C. C. Faith, Algebra: Rings, Modules and Categories, Springer-Verlag, 1973, 

M. Hall, The Theory of Groups, Chelsea Pub. Co., 1976, 

I N. Herstein, Noncommutative Rings, A. M.S. (J. Wiley), 1968. 

N. Jacobson, Lie Algebras, ‘Interscience Bulogegel622 

A. A. Kirillov, Elements of the Theory of Representations, Springer-Verlag, 1976. 
A. G. Kurosh, Lectures on General Algebra, Pergamon Press, 1965. 


G. Liubarskii, The Application of Group Theory in Physics, Pergamon Press, 1960. 


— 
. 


A. Il. Mal'tsev, Algebraic Systems, Springer-Verlag, 1973, 

L. S. Pontryagin, Topological Groups, Gordon and Breach, 1966. 

M. M. Postnikov, Foundations of Galois Theory, Pergamon Press, 1962. 

]. -P. Serre, A Course in Arithmetic, Springer-Verlag, 1973. 

J.-P. Serre, Linear Representations of Finite Groups, Springer-Verlag, 1977. 


H. Weyl, The Classical Groups; Their Invariants and Representations, Princeton 
University Press, 1939. 


D. P. Zhelobenko, Compact Lie Groups and Their Representations, Translated by 
A.M.S., 1973. 


Chapter 7. Groups 


This chapter further develops the concept of a group, which was introduced in 
Chapter 4, In this chapter we emphasize not so much abstract groups as certain natural 
types of group “actions”. It was the concrete realizations of groups which gave the impetus 
for the development of the general theory and was responsible for its reputation as a 
valuable instrument for mathematical investigation. 

In examining these special (but important) examples we shall see the key role played 
by (homo-, epi-, iso-) morphisms of groups. These group-theoretic constructions allow us 


to reduce the study of complicated objects to simpler ones, 
$1. Classical groups in low dimensions 


1, General definitions. A basic course in linear algebra and geometry supplies us 
with examples of groups which deserve an especially detailed investigation. The transfor- 
mations of affine, Euclidean, and Hermitian spaces which leave fixed a given point (say, the 


Origin) lead to the so-called classica] groups GL(n), SL(n), O(n) , SO(n), U(n), SU(n). 


323 


It turns out that these are all examples of so-called Lie groups. One should also include the 
symplectic group Sp(n), but we do not intend to describe all of the classical groups; 
there are many books where the reader can find treatments of the symplectic group. lf n 
is small, we speak of the classical groups in low dimensions. We have already encountered 
the groups GL(n) and SL(n) in Part I. In our definitions of the other groups we shall 
want to avoid dependence on geometry; once one chooses an orthonormal basis in 


n-dimensional space, the orthogonal and unitary groups can be defined in terms of matrices: 


nm 


Gt) = ta M_(R)|"A = ee Aceon 


SO(n) = {A € O(n) |det A = 1} ‘ 


ul 
aS 

> 

i) 


U(n) M,(@)|A*+A = A+A* = E} , 


SU(n) = {A € U(n)|det A = 1} 


Here A* = ay is the matrix obtained from A = (ai) by taking the transpose and then 
replacing the entries by their complex conjugates. The groups SL(n), SO(n), SU(n) 
are called the special linear group, the special orthogonal group, and the special unitary 


group, respectively. In particular, 


Onib y= SEE SG = 


u(1) = {e'|o soe 27) «6 SUI) = {1} , 


cos - sing 


SO (2) OF 277, = UL) 


sin © cos © 


We have an isomorphism between the groups SO(2) and U (1) given by 


cos - sine A 
-—> eC . 


sin © cos © 


i F Shin 
Since the geometric locus of the complex numbers e @ , O< go < 27, is the unit circle 


st in the plane, it is also customary to say that the group SO(2) andthe circle S are 


topologically equivalent. The precise meaning of this statement is explained in a geometry 


324 


or topology course. 
There is a remarkable and much less obvious connection between the groups SU(2) 
and SO(3). We first discuss a geometric realization of SU(2), which will then lead us to 


a geometric realization of SO(3). 


2. Parametrization of SU(2) and SO(3). According to a famous theorem of 
Euler, every rigid rotation of R° , i.e., every element of SO(3), is rotation about 


some fixed axis. For example, the matrices 


cos -sing 0 1 0 0 
B= |/sing cos Of, Cy = 0 cos@ -sin@ (1) 
” 0 0 1] QO sin @ cos 6 


correspond, respectively, to Peer about the z-axis through an angle of © and 
rotation about the x-axis through an angle of @. If we use the parametrization of 
rotations by the Euler angles », 6, %, where 0 =o, = 2f and 0 = 6 =< 9 (for 
now, we are not concerned with the geometric meaning of these angles), then any matrix 


A «€ SO(3) can be written in the form 


A = 
BiCyB, (2) 
where fe F Cg » and Ee are the matrices defined in (1). 
Now let 
a £B 
= SU (2 6 
I se ee! 
We have 
a y 2 
~_t- 4 “1 pee 
Sy eS Saale al) & = 
B 6 med a 
‘ el = = 
Since geU(2) © g* =g  , it follows that 8 = @ and Y = -B. Thus, any matrix 


g in SU(2) has the form 


325 


g = Ul ee SS (3) 


Conversely, if g isa matrix of the form (3), then obviously ge SU(2). Hence, every 
element of the group SU(2) is uniquely determined by a pair of complex numbers ©, 8 


2 2 
such that |a|~ + |p|“ = 1. If we set @=a, +i, and B= 6, +if, with 


; mz Baz 2 2 ‘ : 
a 5. € R and i= y-1, then the condition la| + || = 1 canbe written in the 


form 


2 2 2 2 
a & = 
il ‘3 2 ae oe, 


We are therefore justified in saying that the group SU(2) is topologically equivalent 
(homeomorphic) to the sphere 5° in the four-dimensional space Re 0 


+ 


We now consider the unitary matrices 


ie 
2 ee! 
e 6) cos oy i sin 2 
b = 5 (c = e (4) 
ia) aye 6 
0 2 isin - cos = 
e 8 5 3 


As one proves in a basic course on linear algebra (and as is easy to verify directly in this 


case), givena unitary matrix g ofthe form (3), there exists a unitary matrix U such 
that 
= ub_u (5) 
. ® 
ig/2 . ' , ’ 

where i =e is determined from the quadratic equation 

2 

NS 254 + l= 0 : 


We further note that any matrix (3) with @B #0 canbe given the form 


326 


soa fag) 

1 1 
2 4 min 2 2 

(COS 2 & iL Slot 9 

a(,6,u) = poond = ’ (6) 
;jvc°2e joe 
ane 2 8 2 
jising -e cos 5° e i 
where 

Os he Oa 8 ot 2 i 


(We shall later see that »,6@,% are the Euler angles. The unitary matrices g and -g 
correspond to the same rotation in R° » sothat the range of 4 is restricted to the half- 


interval [0,2).) To see this, it suffices to set 


p-wter 
2 


la| = cos $, Arg@ = += = sin £, Arg B = 


and use the fact that any complex number z is given by the two real parameters Iz| and 
argz (Argz_ is the principal value of the argument arg Z). 


We are now ready to resolve the basic problem of this section. 


3. The epimorphism SU(2) > SO(3). To every vector x= xe) + Xy€, + Xn€n 


2 2 2 
with norm N(x) = xy + X, + X, we associate the 2x2 complex matrix 


H = 0 (7) 


oe bs ee , 2 

The space M, of matrices of the form (7) consists of all Hermitian matrices with zero 
' (tee 3 

trace (ines. He = H , trH = 0). The correspondence between vectors xeIR° and 


ar a 
matrices H.. € M, is obviously one-to-one. In particular, the basis vectors 


£17 &n9&s € Rs correspond to basis matrices hy = Fey : 


327 


_ fo af eo 1 © 
eas [° ie | ec | es i ie (8) 
eRe SG ens ee Me eine he che 
m i oo Boy Bi eB Q ~ S12 Ape eee © 


Note that every linear operator o : Hy. b> He on My with matrix <A_ in the 


basis (8) completely determines a linear operator @:x 4 y on R? with the same 


matrix <A inthe basis e since we have: Hy = @H and 
xe xe 


ie 23? 


ae = Hy + Hs . Since these bases are the only ones we shall be using, in what follows 


we shall often identify operators with the corresponding matrices. 


Now let g_ bea fixed element in the group SU(2). We consider the map 
+ -1 
: > gH a 9 
pees peg (9) 


Since similar matrices have the same trace, it follows that tr eH) = ae Hy. = 0, In 


addition, g* = F = =" , and hence 
HS) Belt -1 
(gH, g ) = ) Wye = gH, 
+ + 
so that ee ,) € M, : 
y Va Ely: 
+ 3 1 2 
OH) = Ba a = isle 
g y, “ly, 3 y 


where y = v, »Yoo Yq) € R° . It is clear from the defining equations (7) and (9) that 


+ + 6 oF 
= a H « 
os ieee ae ale (Hy) + e ( a) 


3 
Thus, 6, (respectively, ¢,) is a linear map on My (resp. RR). 


We show that e : R? = R° is an orthogonal operator. We have: 


2 2 2 of 
= = =a H = - det H = 
a) Dy) Soave an Va ian Le (tah) 
-1 2 2 De, 
= -detgH,g = ~detH, = x) +X, + X= N(x) , 


ibs Br o e, preserves the norm, and hence also the scalar product. We have not yet 


328 


; : 3 : : 
established whether or not oe changes the orientation of JR ; this depends on the sign 


of det o., . We only know that det e,, =4+1, 


It follows from the definition that 


+ + -l -1 -lo+ 
= = = H 
% @ H) = 8, (8, H, 8 )8) (8) 8) H, (eB) E> (AL), 
a : ; ; 11 0 
where e, is the orthogonal unit matrix of order 3 corresponding to E = Io 1 I e SU(2). 


Thus, the map 
+ + 
eae Se. eo ee 


is ahomomorphism from SU(2) to O(3). The kernel Ker @= Ker e) consists of 


those unitary matrices g for which e, = é, . In other words, 


+ 
Ker@ = {g « SU(2)|gH = Hg, YH e Mj} = (g ¢ SU(2)|gh, 2 he, oie ae 


where fh, ,h hy} is the basis (8) of the space Mo . A direct verification shows that 


o? 3 
a B 

ES See iis nena jah seh 8 a (ey a Ee Oe, 
-B @ 


We now consider the images of the unitary matrices (4) under the homomorphism 
@. We carry out the calculation for ar in the basis (8): 


=a 


b = i 
eo ae (cos @)h, + (sin ©)h, ; 
pole bia See sine he eae 
i = neo i? cos © 2”? 
=] 
h, b = 
MS 3s hy 


A oh 
Thus (here we feel free to switch from @ to @ and from matrices to operators), 


; : & 
o = a (see (1)) is rotation of IR° about the X,~axis through an angle of @. If 
p 


~ and u are chosen so that (5) holds, then, since @ isa homomorphism, we have 


329 


alt 
d = lire = 
raiy an det - det e, 1 + (det ,) 1 


This shows that is in fact ahomomorphism from SU(2) to SO(3). 


We can similarly verify that ay = C, is rotation about the X-axis through an 


9 8 
angle of @. Now for any matrix A € SO(3) we have 


A= Be Ga B= 
® 


at) bee. 6 b c.b = Be, 6, v) 


oe 6 Yb © 8 wv 
Hence, the image Im@ contains allof SO(3), and we have proved 


THEOREM 1. ‘The group SO(3) is the homomorphic image of SU(2) under the 


homomorphism @:g & ®, with kernel Ker@ = {+E}. Eachrotation in SO(3) 


corresponds to precisely two unitary operators g and -g in SU(2). 0 


3 
4, Geometrical characterization of SO(3). Theorem 1 immediately implies the 


following 


COROLLARY. The group SO(3) is topologically equivalent (homeomorphic) to 


three-dimensional real projective space R(P?) é 


Proof. We saw in Subsection 2 that the elements in SU(2) are in one-to-one 
correspondence with the points of the sphere 5° in R’ . The two linear operators g 
and -g ¢€ SU(2) correspond to diametrically opposite points in 3° , which are glued 


together (identified) under the homomorphism ©. We thereby obtain one of the models of 


the projective space R(P’) 5 oO 


In the usual course on linear algebra and geometry, the projective space R(P") is 
defined to be the set of straight lines through the origin in apes . Each such line inter - 
sects the unit sphere oe (centered at the origin) at precisely two diametrically opposite 
points. Giving one of these points uniquely determines the line through the origin. But this 


i 
means that R (P") can be defined as the quotient space of the unit sphere Ss" in mee 


330 


with respect to the equivalence relation that calls two points of ca equivalent if they are 
diametrically opposite one another. We are not at this point concerned with giving the 
topology on R (P"). 

We have arrived at a remarkable result. ‘The sphere 5° and the space IR(P°) 
have a group structure -- SU(2) inthe first case, and SO(3) in the second case. It 
turns out that any attempt to define a continuous group structure on 3? or on R(P) is 
doomed to fail (this fact is tangential to our theme, and will not be proved here). 

According to Theorem 1 and its corollary, the group SO(3) is "twice as small" 
as the group SU(2). Since we have an epimorphism SU(2) ~ SO(3), it is natural to ask 
whether there exists a monomorphism S$0O(3) ~ SU(2). We shall see in Chapter 8 that 


this question has a negative answer. 
EXERCISES 


1. Fill in the gaps in the proof of Theorem 1, i.e. , go through an actual 
verification (without alluding to courses in linear algebra and geometry) of all the minor 


assertions, starting with Equation (2). 
2, Using the geometrical characterization of SU(2), show that 
(O71, OO} (O10 Oye (07070. Nee 000) 1G) (0, 1, 0, 0) 


(taking the product of points on ai These same points (0,1,0,0) and (0,0,1,0) 


commute when considered on IR (Py, 


3. Show that differentiating the entries in the unitary matrices 


K, (t) = » Ky it) = 2 Kips 


331 


with respect to t and then setting t = 0 leads to the matrices 


iO 1 i i7O i i ijl 0 i 
K = oe = a = next = — = — = = 
1 2/0 | Dee egies 5 ~ oiig agit = oa) 4 
which form a basis of the space M, of skew-hermitian matrices 
a ik, = k, + ik) ae 
9 = ’ ; ? 
k, aF ik) ik, j 
with zero trace: K* = -K trkK =0, 


$2. Group actions on sets 


1. Homomorphisms G ~— S(Q). We began group theory in Chapter 4 with 
examples of transformation groups, a e., subgroups of the group S(Q) of all one-to-one 
Maps ofa set Q to itself. This approach is consistent both with the historical path along 
which group theory developed and with the importance of transformation groups in other 
areas of mathematics. The so-called abstract theory of groups, which arose in a later era 
(the first half of our century), has gone far beyond transformation groups, but many of the 
concepts in this theory bear the imprint of earlier times. In fact, the most common source 
of these concepts is the idea of a realization (a representation) of a given group G in 
S(Q), where Q is some suitably chosen set. By a realization of G in S(Q) we mean 
any homomorphism @:G ~ S(Q). If . is the transformation in S(Q) corresponding 
to geG, then is 9 is the identity map Q-Q, and we have eH = i” oh 
for g,heG. The image ®, (x) of a point (element) x ¢« _ under the transformation 
@_ is often denoted simply gx; we speakofa map (g,x) © gx from the cartesian 
product (G,Q) to QQ. Perhaps we should be more careful and write geox or g*x, 
so as not to confuse this operation with multiplication in G, but there is usually no need to 


do this; in practice, there is rarely a danger of ambiguity. We can now write the above 


properties of ¢, in the form 


332 


(i) Ge = ae, Pes ae) e 


(ii) (gh) x g(hx) ; Boll @ G 


Any time we have a map (g,x) } gx fromthe cartesian product Gx Q to 9 
which satisfies (i) and (ii), we say that the group acts (on the left) on the set Q, and Q 
is called a G-set. Conversely, if we have a G-set @, then we can use the formula 

ECS Es xegQ , 

for each geG_ to define a map e,, :Q-Q. It then follows from (i) and (ii) that the 
map @:g bb o, is ahomomorphism from G to S(Q). It is also customary to say 
(especially when le| < ©) that we have a representation (¢,) of G ina permutation 
group. The kernel Ker @ is called the kernel of the actionof G. If @ is a mono- 
morphism (in other words: if gx =x, Y¥xeQ = g =e), then wesaythat G acts 


effectively onthe set Q. 


Remark. Anyactionof G on Q induces an action of G on ae =x... x¥Q 
by the obvious rule: g + (x) ery x) = (x,, GOH gx) . There is also an induced action 
of G onthe set of all subsets ®(Q) (see Exercise 4 of §5 Ch. 1). Weset gd=aaq, 
and, if T is anon-empty subset of Q, then weset gT = {et lt eT}. The properties 
(i) and (ii) are easily checked. Clearly, T and gT_ have the same cardinality; hence, 


G_ induces an action on the subsets of a given cardinality. 


2. The orbit and stationary subgroup of a point. Two points x,x'¢Q are saidto 


be G-equivalent, where G isa group actingon Q, if x' = gx forsome geG. 
Using (i) and (ii) in Subsection 1, we easily show that we have reflexivity, symmetry, 

and transitivity, and hence an equivalence relation which divides Q into disjoint equivalence 
classes. Each equivalence class is called a G-orbit. The orbit containing Xq € Q is 
denoted G(X) ; thus, G (Xp) =e Xa lg e¢ G}. But sometimes other notation is used, 


depending on the specia) nature of various actions of groups of sets. The notion of an orbit 


333 


arose from geometry, For example, if G = SO(2) isthe group of rotations of the plane 
about the origin, then the orbit of a point P is the circle centered at the origin passing 
through P, andthe set Q = R? is the union of all of the concentric circles, including 
the one with zero radius (consisting of one point, the origin), We have encountered orbits 
before, in the first part of this book. In Chapter 4 we used orbits to write a permutation 
Te Ss as a product of disjoint cycles. In that case G _ was the cyclic group (f). 


Let Xo be a given point in ©. Consider the set 


St (xp) = {g ¢ Glgx, = xy} cG 


Since eXy = Xo and g,he St (Xo) = gh! € St(x it follows that St(x is a sub- 


0) 2 0) 
group of G. It is called the stationary subgroup (or the stabilizer) in G_ of the point 


Xo € Q , and is often denoted o ; In the case of the above example of SO(2) acting on 
0 


R° , we have St(origin) = SO(2) and St(P)=e if P isnotthe origin. We always 


have 


' es al 1 aie ' 
BXy = BX <=> g pe St (xp) ==> ple g St (xp) 


Thus, the lefts cosets gSt (xo) of the stationary subgroup St (xo) in G are in one-to- 


one correspondence with the points in the orbit G (X5)- In particular, 


Card G (Xp) = Card (G/St (x5) =m(G St (x5) ‘ (1) 


Here, as before, G/St (Xp) denotes the quotient set of G by St (Xo) » and (G: St (x5) 


is the index of the subgroup St (Xo) in G. The cardinality CardG (x5) is often called 


the length of the G-orbit of Xg- 
From (1) and Lagrange's Theorem it follows that the length of any orbit of a finite 


group G_ divides the order of the group. 


It should be noted that the point Xo in the right side of (1) can be replaced by any 


other point x! 


0 € G (x5) . Thus, 


334 


Card G (Xp) = Card G (xp) = (G2 Se (x5) 


We now give a stronger statement concerning stationary subgroups. Suppose x’ = 8X - 


Then 
St (xX) 8 Xq = St (Xp) Xp = % = BX » 
so that 
-1 : ; eal ‘ ae ) 
g St (Xp) BXg =X), Le 8 St (xp) 8 t (Xp 
Similarly, 
St (x,) me S Wee) 
g§ Xo g& ai ’ 
since 


' _ Pik 


all 
St (xq) & Xo = 


Thus, we have the equality 


St(xi) = gSt(x))g = {ghg |h € St(x))} 


Two subgroups H,H'CG are called conjugate if H’ = gH a for some geéeG (see 


Example 1 below). We can then state the following theorem. 


THEOREM 1. Suppose thata group G acts ona set QQ. If two points 


Xp Xp € Q lie in the same orbit, then their stationary subgroups are conjugate: 


Thee all 
xy = &Xq => St (x5) = gSt (xq) g 


Further, if G isa finite group, and if QQ splits up into finitely many orbits 
Q = 0, U9, U... oa Le 
with representatives x), Xyy eee 


Xs then 


Fr 
lal = Dy Gist). q (2) 
i= 1 


335 


Many applications of the "orbit method" to finite groups are based on this formula (2). 


3. Examples of group actions on sets. We now discuss some examples which 


relate to ideas from group theory. 


Example 1 (the conjugation action), Taking Q =G, we have the actionof G on 
G_ defined by 


cil 
gxg , Vx eG 5 


are 1, (x) 


; Al 
We could have written gex = gxg , but we prefer to use our old notation from Sub- 
section 2 of §3 Ch. 4 for the inner automorphism bs corresponding to geG. 
The action of g given by = e Inn(G) is called conjugation, The kernel of this 


action is called the center of the group G: 
Z(G) = {zeGlI,(@) = z, v¥geG} = {zeGlzg = gz, ¥geG} . 


The orbit of an element x ¢«G =Q, which in the present context we shall denote x? is 
called the conjugacy class of x. If a,be x° » we sometimes write agb ee he 
stationary subgroup St(x), which in this context is called the centralizer of x, is often 
denoted C(x) (or Co (x), if it is necessary to indicate the group G_ in order to avoid 


confusion). 


According to the remark at the end of Subsection 1, the conjugation action carries 
over to subsets and subgroups of G. Two subsets H, TCG are conjugate if 
T= aise forsome geG. Let H bea subgroupof G. It is customary to call 

N(H) = st(H) = {geGlgHg = H} 

the normalizer of H in G. Inparticular, H is anormal subgroup of G (written 
H<aG) if N(H)=G, inagreement with the definitions in Chapter 4. Because of the 
relationship (1), the length of the orbit He (the number of subgroups conjugate to H) 
is equal to the index of the normalizer N(H) in G. 


336 


G G ; ; 
Now suppose that G_ is a finite group, and that Se ae its conjugacy 


classes, where the first q of them consist of one element: 


4 
eS 
iT] 
a 
4 
me 
~~ 
a 
" 
e 


og Gl (x7 =e) 


Then Z(G) = {x, rXoyeeey x,} , and we can write relations (1) and (2) inthe form 


|x? | = (G:C) a") 
Als 
ee eG) es ee Goya @ 
i=qt+l 


For example, suppose that G=S,. Then r=3 and q=1 (ie., Z(S3) =e), and 


we have 
SoS pe Uy (12) (18) (28) ese i382) 


for the partition of S, into conjugacy classes. The sizes of these conjugacy classes 


3 
(orbit lengths) divide 6 = Is, | ; aS must be the case by (1'). 


The relation (2') immediately gives us the following interesting fact. 


THEOREM 2, Every finite p-group G (i.e., every group of order p" = il, 
where p isa prime) has a non-trivial center, i.e., Z(G) fe. 


Proof. If G is an abelian group, then G = Z(G), and there is nothing to prove, 


n, 
Otherwise, r>q, (G:C(x,)) =p : , Where n> 1 for i> q, and (2) takes the 


form 
n “ at 
pS Ae)| ea 
i=qt+1l 
which shows that IZ Gl is divisible by p. o 


It is easy to find examples of non-abelian p-groups. ‘Take the group of upper 


triangular matrices with entries in the finite field of p elements: 


337 


i gf 
PS <i till apes ; 
D0 P 


This group P is clearly a non-abelian group of order me 


Example 2 (translation). The map L, :G-*G_~ defined by L, (g) = ag, which 
we used in the proof of Cayley's Theorem (see 43 Ch, 4), is usually called left translation 
by a. Since eg =g and (ab)g = a(bg), it follows that the left translations give an 
action of G_ on itself, which induces an action of the group G_ onthe set of subsets of G. 


In particular, let H bea subgroup, and let G/H_ be the set of left cosets gH, geG. 


It is clear that the map 
(x, gH)  x(gH) = (xg)H 


gives an action, denoted ue , ofthe group G onthe set G/H. The kernel of this 

H 
action is Ker L’ = {x «G|L" (gH) = gh, V¥eeG) = 1x eG|xgH = eee eG oan 
other words, xe Ker a if and only if oe €H forall geG, ie., if 


=) 
Regine 5 WigeG 4 IN 


Kerk = (i) ghs 
geG 


is the largest normal subgroup of G containedin H. The actionof G on G/H is 
effective if and only if there is no non-trivial normal subgroup of G containedin H. 
In any case, if H is any subgroup of index n in G, then we obtain a 
F H : H : 
representation (L ,G/H) ofthe group G_ by permutations of cosets of H in 
G. This representation (which may, however, be not a monomorphism) is much more 


efficient than the one obtained using Cayley's Theorem. 


Example 3 (transitive groups). A group GC Sa of permutations of the set 
= (1, 2,..., 0} is called transitive if the orbit G, of some element ie Q (and 
hence of any element of 9) is allof Q. In other words, an action GxQ-~-Q is 


transitive if for every pair of elements i,je¢Q there exists at leastone geG _ such 


338 


that g(i) = j. 
Let al! be the set of all ordered k-element subsets of ©. Theactionof G 
[k] per eee a [k] 

on Q induces an action on QQ. If this action is transitive on © ~, we say that G 


acts k-transitively on Q. For example, the symmetric group s is n-transitive on 
Q , and the alternating group An is (n-2)-transitive. 

Any group G acts transitively onthe set G/H of left cosets of H (see 
Example 2). To see this, let gH be two cosets. Then 8, g, (gH) = as . But, 
remarkably, very little is known about k-transitive groups for k > 5. There is even 
a century-old conjecture (unproved) of Jordan that there are only two such groups: s and 
AL 4 

We shall now obtain some interesting quantitative results concerning transitive groups, 
which we shall need later. Let G bea group acting transitively on Q. We let G; 
denote the stationary subgroup St(i) ofa point ie |. We kmow(see Theorem 1) that 
if i= g,(l), then G=¢.G, a: i= 1,2,...,0 (g, =e). Maddition, the 
elements g, can be chosen as left coset representatives for G modulo G, 8 


G=G,Ug,G, U... Ug G, 5 (3) 


In particular, Ic| =n Ic, | (which agrees with our general results on the length of orbits 


in Subsection 2). 


THEOREM 3. Let G bea transitive groupon Q, andforany geG let N(g) 


be the number of points of Q which g leaves fixed, Then: 


(i) 3 N(g) = lc| (thus, if we divide both sides by lc | » we conclude that 
geG 


"on the average" each element of G_ leaves one point fixed); 


(ii) if G isa 2-transitive group, then 


>> Ne = 2Ic| 


geG 


339 


Proof. (i). We have 
a 


Me) = ay 


geG = il 


where [(j) is the number of elements of G_ leaving the point jetixedyeiness 


PG Ss iG ,I. Since G_ is transitive, we have Is, = lg, G, g,"| = ic, | , where the 


g; are as in (3). Hence, 


(ii). The 2-transitivity condition means that the stationary subgroup oT acts 
transitively on the set Q, = Qvil), ie., te G , orbits are {1} and Q,. Let 
N'(x) be the number of points in a which are left fixed by xe G) . Applying (i) to 


G, and OF; we obtain 


BNE = ie, 


xeG, 


Since N(x) = 1+ N'(x) for xe G, (because of the point 1), we have 


Ee = PE 


x€G, 


The same relations hold for all of the other . : 


NG 2(c.| =?) |e | 


xeG, 
i 


Summing over j, we obtain 


s se N(x) = 2n|/c,| = 2 Ic] 
t= il oe Ge. 


J 


In the double sum N(x) is counted once for every oi) which contains x. But x leaves 
fixed precisely N(x) points, and so is contained in N(x) subgroups Gi . This means 


that each x contributes N(x)? to the sum. On the other hand, any element yeG 


340 


which is not contained in the union U Si must permute all of the points, so that N(y)=0. 
J 


We can thus write 


n 
yy ner = > YS nw=alel . oO 


geG j=l ace 


4. Homogeneous spaces. In geometry it is of particular interest to consider the 

2 
case when {Q is a topological space (for example, the line IR orthe sphere S$), G 
is a so-called continuous (or topological) group, and the action (g,x) © gx satisfies the 


reasonable requirement: 
(iii) f£(g,x) = gx is a continuous function of the two variables g and x. 


A group G_ which Asie on Q in such a way that conditions (i) and (ii) of 
Subsection 1 and condition (iii) above are all satisfied, is called a group of motions of Q. 
In many cases these are motions which preserve some metric on Q. The space Q is 
called homogeneous if G acts transitively in the sense of Example 3, i.e., if all of the 
points of Q belongto the same G-orbit. 

From the discussion in Subsections 1 and 2 it is clear that there is a one-to-one 
correspondence between the points of the homogeneous space andthe cosets in G_ of 
one of the stationary subgroups H. Toamotion geG of Q we associate the map 
g'H & gg'H onthe set G/H. 

We now consider our example SO(3) in 81 from this new point of view. The group 
SO(3) acts on the two-dimensional unit sphere 5? - It is obvious that, to any pair of 
points P,Qe s? there corresponds a motion (rotation) taking P to OO. tho, s? is 
a homogeneous space with group SO(3). The stationary subgroup St(P) of any point 
Pe s? leaves fixed the entire axis through P and the origin. Hence, St(P) = SO(2), 
the group of rotations in the plane perpendicular to the axis through P, 

Since the elements of SO(2) are identified with the points of the unit circle st 


2) 


it follows that the group SO(3) can be thought of as a “layer cake" of unit circles "indexed" 


341 


; 2 1 2 
by the points of the sphere S  :SO(3)/S' = S°. Inthis we call SO(3) 7 3? a 
2 
fibration with base Sand fibre si over each point Pe ge . We shall not go further 


into this subject, which properly belongs in a course on topology. 
EXERCISES 


1. Let @ and g' be homomorphisms of the group G into S(Q) and 
S5(Q'), respectively. Then the actions on Q and ' are said to be equivalent if there 


exists a bijective map o:Q ~ Q' such that the following diagram is commutative for 


all geG: 


‘? een Co 


In other words, @' =o 9 : . Prove that any transitive action of a group G_ is equiv- 
alent to the action of G _ onthe left cosets of some subgroup H. 

2. Using Theorem 2, prove that any group of order p? (where p isa prime) 
is abelian. 


3. Prove that the center of the group P at the end of Example 1 is: 


Z (P) | 
| 


Find the conjugacy classes in the group P. 


(e: (2 VA 


oof 
or © 
ae) 


4, Let n beanatural number. Write itasasum n= a + n, tees + fie 


with a > ny >see > > 1. Let p(n) denote the total number of such partitions for 
es aia re 


all m=1,2,... . Thus, p(3)=3, p(4)=5, andsoon. Given a permutation 


meSis by writing it as a product of disjoint cycles #7 = 7 Mo eee TL (see §2 Ch. 4), 


342 


we obtain a corresponding partition of n. Show that the conjugacy classes in the group 


S__ are in one-to-one correspondence with the partitions of the integer n. 
n 


5. Suppose that #« s. is a product of r cyclesofilength 1, s cycles of 
length 2, t cyclesoflength 3, and soon, sothat n=r+2s+3t+--- . Show 
that the cardinality of the conjugacy class in sa which contains # is given by the 
formula 


n! 


i r! ae ll 3 tl eo 


6. Suppose that a group G actsonaset . Wecalla subset [TcQ 
invariant under G (or G-invariant) if gxer forall geG and xefr. For 
: : ‘ ; 2 ae Ama 
example, an invariant set forthe action SO(2) x R° ~ R is a set of concentric circles 
about the origin. Prove that any invariant subset of Q is a union of orbits, and that the 
G-orbit of anelement x¢ is the same thing as the smallest invariant subset containing 


xX. 


7. Givena group G anda subgroup H, show that the action HxG-G 


defined by (h,g) t» hg gives the partition of G_ into right cosetsof H. 


8. By modifying the proof of Theorem 1 , derive the relation 


where r(G:Q) is the number of orbits for the action of G on QQ. 
83. Some group theoretic constructions 


This section, especially the first subsection, is somewhat more difficult, and we 
shall have to return to it several times, using different concrete examples to solidify our 


understanding of the abstract concepts. 


343 


1, General theorems on group homomorphisms. In 84 of Chapter 4, we saw 


that, given anormal subgroup K ofa group G, it is possible to construct a new group 
G/K , which is called the quotient group of G by K. For example, in working with the 
epimorphism @: SU(2) ~ SO(3) (see §1), it is natural to introduce the quotient group 
SU(2)/{+E} and compare it with the image Im = SO(3). It is not hard to see that 

SU (2)/{+ E} = SO(3) , but, in order not to have to go through the argument again each 
time, it is useful to prove some general facts about subgroups, homomorphisms, and 
quotient groups. First, recall that the notation K <1G meansthat K_ is a normal sub- 


group of G. 


THEOREM 1 (fundamental homomorphism theorem), Let »:G-H_ bea group 


homomorphism with K = Kerg. Then K is anormal subgroup of G, and 
G/K = Img. Conversely, if K 4 G , then there exists a group H (namely, G/K) 
and an epimorphism 7:G-~H having kernel K. (m is often called the natural map, 


natural homomorphism, or natural projection. ) 


Proof. We already know that Ker = K 2G. We define the map @:G/K~H 


by setting 
@(gK) = w(g) 


=i] -1 r 
If g,K = fe k 5 tellin 8, 8B € kee o(g, g,) =e, and so o(g,) = (85); this means 


2 
that the map © is well defined (i. e., does not depend on the choice of coset representative). 
Since O(g, K +g, K) = O(8)8,K) = (8,85) = e(g,) p(g,) = O(g,K)o(g,K), it follows 
that © is ahomomorphism. Actually, © isa monomorphism, because if we had 


D> = 
eg, K) = o(g, K) it would follow that o(g,) = °(g,) , so that (8, g,) =e, 


gE, eK, and gk = g,k . It is alsoclear that Img = Imm. Hence @ is the 


desired isomorphism from G/K to Imo. 
Conversely, suppose K 4G. Take fm to be the function which associates to any 


element of G its K-coset, ie., set m(g)=gK. Itisclearthat # has the 


344 


required properties. Oo 


It should be noted that giving the kernel of a homomorphism does not determine the 
homomorphism uniquely. For example, the two automorphisms ge and gb a of 
an abelian group of order a prime p > 2 are different, but their kernels are the same 
(=e). 

If we have a homomorphism p:G- G) and a subgroup HC G, it is natural to 


consider the restriction Play and the image of H under this homomorphism. The 


following theorem greatly simplifies the investigation of all such situations. 


THEOREM 2 (first isomorphism theorem). Let G bea group, and let H and 


K be subgroups, where K isnormalin G. Then HK = KH isa subgroup of G 
containing K. In addition, the intersection Hf K isa normal subgroup of H, and 


the map 


@:hK E> n(n xk) 


is an isomorphism of groups: 


HK/K = H/HA K P 


Proof. The condition K <a G_ canbe written in the form BX S ike, Wee s 
in particular, hK = Kh forall he H. Theset HK = {hk|h €H, ke XK} consists of 


a certain number of cosets hK, Le., HK = U hK. Ifwe replace hK by Kh, 


heH 
we obtain the equality 
HK = VU hK = U Khe= KH 
heH heH 


It is obvious that the identity element e , whichis inboth H and XK, is contained in 


-l al eal salt eal oat 
HK. Next (nk) =k) bh =h (kh ) € HK, so that the inverse of an element 
in HK liesin HK. Finally, HK+HK=H+KH+K=H+HK+K=HK, 1b @s 5 


the set HK _ is closed under multiplication. We conclude that HKC G_ isa subgroup of 


345 


Qa 


Since KO HK and K@AG = K AHK, it makes sense to speak of the quotient 


group HK/K. Let @:G-~G/K_ be the natural epimorphism, and let MH = Fey be 


the restriction of 7 to H. Theimage Ima, consists of the cosets hK, heH, 


0 
eS all cosets of K in G _ which have a representative in H. In other words, 
Im Us HK/K. Thus, we have an epimorphism 
Ty: il JEU 
Its kernel Ker ui consists of those heH_ for which 1 (h) = hK = K, the identity in 
HK/K. But hK=K eheHfN kK, so that Ker m1 =H” kK. The subgroup HN K, 


like any kernel of a homomorphism, is normal in H (this can also be easily verified 
directly). 


By the fundamental homomorphism theorem (Theorem 1), the map 


"9 
-—-l 
= :hK ® h(HfK) is also an isomorphism, from HK/K to H/HN Kk. oO 


:h(HNK) Pe fo (h) = hK_ gives an isomorphism H/Hf K = HK/K. Note that 


Since we have given a "first isomorphism theorem", the reader may surmise that we 
shall give a "second isomorphism theorem". This is the case, but we shall give a more 


special, simplified theorem than what usually goes by the name "second theorem". 


THEOREM 3. Suppose that G isa group with subgroups H and K, where 


KaG and KCH. Then H =H/K isa subgroup of C= 6). and #*:H BH 
is a one-to-one correspondence between the set (G,K) of subgroups of G which contain 
K_ and the set a(G) of all subgroups of the group Ge rie iG ee them 


HAGeHSG, and 


G/H = G/H = (G/K)/(H/K) . 


Proof. Let H e€ Q(G,K). From the definition of G/K_ it immediately follows 


that H/K isa subgroup of G/K. In order to see that the map n*:H PH is injective, 


346 


suppose that H/K = HO/K, where HH, € Q(G,K). Then if he H, » Wwe have 


hyK=h,K for some h eH, , Eo Jal =h,k for some keK, and, since 


2 1 


KcH we have h,«H,.- Thus, HS Hy} we similarly show that HS Hy » so 


2? 
that Hy = H,- 
We now show that #* is surjective. Suppose that H Q(G) , and let H_ be the 
set of all elements of G which are contained in the cosets of K in H (recall that each 
element of H isacoset of K). Then KcH; also, a,beHaak, bK eH = abK = 
= aKbKeH= abe H; and finally, aeH= aKeH=s 42 = oe «Hs ee 
Thus, H isa subgroupof G, and H = H/K (H_ is usually called the preimage of 
ial thy). 
The fact that He Q(G,K), H4G = H 4G, is fairly obvious; namely, 
me oe seen & he Ve Shee Guenl eee a wen By the same 


ee -1 a 5 
argument, H 4G = ghg BRepR cue) Shoes (2a ae, 


Finally, if H is anormal subgroup of G containing K, then, by what we have 


proved, we have two natural epimorphisms 
a:G — G/K; ee = Gyn 


(7(g) = ZH , where g = gK € e) » and we can consider the composition, which is an 


epimorphism 


o=707:G —> G/H 


bd 
defined by: o(g) = 7(@) = gH. Wehave: Kero = {ge Glo(g) = H} = {geGlge Ho} = 
= ee Glek =hK forsome he H}=H. Consequently, by the fundamental homo- 


morphism theorem, the map gH & gH is an isomorphism between G/H and G/H. O 


Example 1, Let n= dm_ bea natural number with divisor d > 1. Obviously, 


nZcd@Z, andthemap x 6 dx +nZ_ is an epimorphism of additive groups 


Z— > d@j/nd = {di + ai 0; 1.0.) m= 1) 


7) 


347 


with kernel mZ. By Theorem 1, we have an isomorphism 
(ye Mee = ae 
(this is also easy to see directly). Using Theorem 3, we find 
“Z/dZ = (Z/nZ)/(dZ/nZ), ie. Z = ey oe 


If we recall Theorem 5 of §3 Ch. 4, we can conclude that all of the subgroups and quotient 


groups of a cyclic group are themselves cyclic. Of course, this result could also be 


obtained without using the homomorphism theorems. 


Example 2. Consider the following subgroups of the symmetic group S4 : 


= {e, (12) (34), (13) (24), (14) (23)} <3 S, (see Exercise 4 of §2), 
S, =e. (2a (23), (123), (iggy 
(here S, is the stationary subgroup of the point i = 4). Since obviously S, n S, =e, 
by Theorem 2 the subgroup H = Se we has the property that 
H/V, = S, /S_ nN Mi = S3 
In particular, lH| = lv, Is, Ss tbeh, isl = 54 . Thus, in addition to a subgroup 
isomorphic to 5S Shas a quotient group isomorphic to S,. Applying Theorem 3, 


Saeed 3 


we obtain a description of the set Q(S,, V,) of subgroups of S, which contain Vi, : 
Q(8,, V4) = 1V,42)) Vy, ((13)) Vyy ((23)) Vi, A, = (123) V,, 84} 


Notice that for each divisor d of 24, 54 has at least one subgroup of order d. For 
example, we have exactly four subgroups of order 3 -- ((123)), ((124)), ((134)), and 
((234)) -- and three subgroups of order 8 -- ((12)) Vy) ((13)5 Vv, , and ((23)) V4 a 
(These are the so-called 3-Sylow and 2-Sylow subgroups. ) There are in all two 
proper normal subgroups (i.e., other than e and S,) : vi and A, . To see this, 


first suppose that K <a S, and Kf V4 #e. Then K>d Me , since the elements # e 


348 


in V4 are all conjugate. By considering the set Q(s,, V4) » we see that either K = w 


or K=A,. Now suppose that KV, =e, K #e. Then 


4 
K 45,, NE ers KV Ss 
and so Rv Ay or Sy: In either case, it is not hard to show that K_ could not be a 


norma! subgroup if its intersection with Vv, is trivial. We leave the details to the 


reader. 


2. Solvable groups. The expression 


Sie ea 
lay) = ayx y 


is called the commutator of the elements x and y inthe group G. It is the "correction 


term" needed in order to reverse the order of multiplication of x and y: 
my = [Dsawilwee 


If x and y commute, then [x,y] =e. Roughly speaking, one can say that, the more 
commutators in G aredistinct from e, the farther G_ is from being an abelian group. 
Let M_ be the set of all commutators in G. The commutant (or derived subgroup) of G 


gi) 


is defined to be the subgroup G' = = [G,G] generated by the set M (see Subsection 


Boh 2. Chi 4)- 
G (ey litey ec) 


=i] oll il 4 
Although [x,y] = yxy x =[y,x] isa commutator, it is not true that the product 
of two commutators is always a commutator. So we define G' to consist of all products 


of the form 


Cx),y, JCx,,y,] ee [xy,] with XV, €G 


Of course, when dealing with a specific example it is useful to have a more down-to-earth 


description of the commutant G'. 


349 


Example. Let G = s - The commutator [a, 8] = apa ! at of any two 
permutations @, B € Ss is obviously an even permutation. Hence, S'C A_. Further- 
n n 


more, 
Beye Ren cll renca ll He TE Sas on 
G)dak Gj) Gk)” = (ij) Gk) (ij) Gk) = (ijk), 
and, since the 3-cycles (ijk) generate all of the alternating group es (see Exercise 8 
of §2 Ch. 4), we conclude that sy = AN - Note that s 4s , and that the quotient group 
n 
S_/S' is abelian. 
inl “Gal 
Returning to the general situation, let us consider an arbitrary group homomorphism 


o:G~ G. Since 
=e -1 -1 
o([x,y) = o(xyx y ) = w(x) oly)o(x) wy)” = [o(x), oty)] , 


we have o(G')C (G)' » and o(G') = (G)' if @ is anepimorphism. Nowlet K bea 
és 
normal subgroup of G, and let © = I Re tee ae be an inner automorphism of G, 


which then induces an automorphism of K. By what we have just shown, t (K') c K' for 


any aeG, and this means that 
Kee ee (1) 


In particular, G' <«@G. We now prove a more general fact, which gives the intrinsic 


meaning of the concept of the com mutant. 


THEOREM 4. Any subgroup KCG_ which contains the commutant G' is normal 
in G. The quotient group G/G' is abelian, and G' is contained in any normal sub- 
group K_ for which G/K is abelian. In particular, the maximal order of an abelian 


quotient group G/K is equal to the index (G:G’). 


=I eal 1 
Proof. If xeK, geG, and G'CK, then gxg =(gxg x )x=& 
=[g,x]xe«GK=K, andhence K isnormalin G. Next,whenever G'c K and 


K 3G (in particular, for K = G'), we have 


- - “1-1 
Piokovak © bK sa Kb K = aba b K = ([6,b]K =k , 


350 


i.e. , the commutator of any two elements of the quotient group G/K _ is the identity 
element (= K). Hence, G/K is an abelian group. Conversely, if K 4G and G/K 


is abelian, then 
Leif Ke = [Lae lek || = 


forall a,beG. Hence, [a,b] e« K, andso G'C K, since G' is generated by 


the commutators. @ 


Remark. We now know two important normal subgroups of any group G: the 
center Z(G) andthe commutant G'. In general, there is only a weak connection between 
them, but, roughly speaking, the following principle applies: the "nearer" G_ is to being 


abelian, the larger Z(G) is'andthe smaller G' is. Here is an interesting fact: 


The quotient group G/Z(G) ofa non-abelian group G by its center cannot be 
cyclic. 

Proof. If G/Z(G) isa cyclic group, then G = U ae Z(G), and any element of G 
Saas i 


a Its ] te 


has the form g =a z » Ze€Z. Inthatcase, [g,h] [a'z, a’z']= 
for any two elements g,heG. Thus, G' =e, and G_ isan abelian group, 


contradicting our assumptions. oO 


In G' we can consider the commutant (G')' = G" ; Which is called the second 


derived group (or second commutant) of the group G. Continuing in this manner, we 


define the k-th derived group of ) =(G RUS ae By (1), we have go) 3G, and 
(k) (k-1) 
of course G 4G . We thus have a sequence of normal subgroups 
GP gi) b> og) e+. p g®) > gk) oa (2) 


with abelian quotient groups as V7 ght), 


A group G_ is called solvable if the sequence (2) terminates with the trivial sub- 


351 


(m) 


ello) Gi. Ih, iki © =e forsome m. The least such m_ in that case is 
called the level of solvability of G. It is obvious that any abelian group (in particular, 
any cyclic group) is solvable of level 1. Notice that any solvable group G hasa 


(m-1) 


normal abelian subgroup #e, namely G ,» if m_ is the level of solvability. An 


example of a solvable group is Sy: Si = Ay ; Ay = ve A Vi =e. Thus, the alternating 


group A is solvable of level 2, andthe symmetric group S, is solvable of level 3. 


4 4 


The term "solvable groups" comes from Galois theory, which we alluded to in Sub- 
section 1 of §2 Ch. 1. The solvability of S4 turns out to imply that algebraic 
equations of degree n < 4 can always be solved by radicals. The reader can study 
these questions in several of the texts listed in the suggestions for further reading at the 


beginning of Part II. 


3. Simple groups. There exist groups other than e which are equal to their 
commutant, and, in particular, are not solvable. We shall now show, in fact, that there 
exist non-abelian groups having no non-trivial (# e or G) normal subgroups. Such a 


group is called simple. 


LEMMA. Any normal subgroup K ofa group G_ isa union of some set of 


conjugacy classes in G. 


al 
Proof. If xeK, thenalso gxg ¢ K forall geG, since K_ is normal. 
Hence, if x is contained in K, then so is the entire conjugacy class rad . So we can 
wate Ie IW oe 


tel 


THEOREM 5. The alternating group A. is simple. 


Proof. Not counting the identity permutation e, the group A 5 has 15 elements 


whose square is e, namely the permutations (ij)(k4) (there are three such elements in 
5 
the stationary subgroups of each of the points 1,2,3,4,5); there are 20 = 2(,) elements 


i,i,) of order 5. The 


(ijk) of order 3; andthere are 24 = 4! elements (1 i ij ig iy 


352 


elements of order 2 are all conjugate in A: they are clearly conjugate in S. » and, 


5° 
since conjugation by the transposition (ij) leaves (ij) (k#£) fixed, we can always make 


two elements of order 2 conjugate using a product of an even number of transpositions, 


i.e., any two such elements are conjugate in As. The same holds for the elements of 


order 3. But the elements of order 5, though they are conjugate in So » Split into two 


conjugacy classes in A. with representatives (12345) and (12354). To see this, note 


5 
that (45) (12345) agi = (12354), andthe centralizer (= the stationary subgroup under 


the conjugation action) of (12345) in A. isthe cyclic group of order 5 generated 


5 
by (12345). Thus, we have the following table of the number of elements in each conjugacy 


class in As B 


Fe | (12) (34) (12345) | (12354) 


The bottom row gives representatives of the conjugacy classes, and the top row gives the 


number of elements in each conjugacy class, 


Now let K bea normal subgroup of A. . According to the lemma, 


Ik| = 6 IPA 15 406, 120 216 rbe alae 


1 


where 6, = 1 (since ee«K) and oeae or 1 for i= 2,3,4,5. It is not hard to 


see that, since Ik| must be a divisor of [A = 60 (by Lagrange's Theorem), we are 


z 


left with only two possibilities: 


fo 
— 
a 
rT 
oO 
i 
a 
1 
a 
" 
= 
ft) 
3 
a 
na 
li 
> 
Oo 


Using induction on n, it is now possible to establish the following fact, discovered 
by Galois: all of the groups AL for n> 5 aresimple. Since any subgroup of a 


solvable group is solvable (HC G = yh) = og) » K=1,2,...), Theorem5 implies, 


353 


in particular, that the symmetric group sy is not solvable if n> 95. 
THEOREM 6, The rotation group SO(3) is simple. 


Proof. By Theorem 3, it is sufficient to show that any normal subgroup K _ of 
SU (2) which contains the kernel {+E} ofthe epimorphism @: SU(2) = SO(3) (see 
Subsection 3 of $1) andisnot {+E}, must be allof SU(2). Wecan interpret the 
relation (5) in §1 as saying that every conjugacy class in SU(2) contains a diagonal 


matrix a =b, = diag{e ”, et . Since, by the lemma, K _ is the union of some set 


29 


of conjugacy classes in SU(2), without loss of generality we may assume that oe eK 


for some o > 0 for which sing # 0. Then K _ must also contain any commutator 


ce lel? a Bille? of la -~ 
(del = cee eg aR - week i = ~ 
@’ e° % ig 223 en alm celle 2 
D ae 
la|“+ [ele °® : 
a 2 2 -i20 ° 
iH} a le| + [8 e 


where lo.|? + lel? = 1 (see (3) in 81). We obtain the following expression for the 


trace of the matrix a, oes 


tr{d,,e] = 2 jal? + |al? cet2% + 222%) = 20-2] 6]? sin? o) 


Here |p| can take any value in [0,1], and sin~ #0. Again using (5) of §1, we 


-] ; id -iv 
can find a unitary matrix he SU(2) such that ale g]h = dy Sdiagian ye) |. 
where d,e€K. Since eu and yu are the roots of the characteristic equation 


eee A eine 2) 


of the matrix [d_,g], it follows that, if we let |p| run through the values from 0 to 
i) 


1 then we obtain for % any point on the interval {0, 20]. Thus, K_ contains any 


, 


such d and also the corresponding conjugacy class, as the parameter % varies in 


wb 3 


0 < y < 2. Butfor every o > 0 thereisa natural number n such that 


354 


O< he eae 2p. Hence, we may conclude that K_ contains any given element 


It is already clear from Theorems 5 and 6 that the class of simple groups contains 
many groups, both finite and infinite, which are important in applications. But, surprising 
as it may seem, no one has been able to find a reasonable description of all finite simple 
groups, and it is unclear whether such a description will ever be found. (Note: This was 
true until 1980, but the situation is changing now.) 

4, Products of groups. We now consider a construction which allows us to make 
new groups out of the ones we have. We have already encountered special cases of this 
construction. : 


The direct product of two groups A and B istheset A xB of all ordered 


pairs (a,b), where ae A and beB, with the binary operation 


(a), b)) (ay, b,) = (a,ay, by b,) 


Strictly speaking, we should write (a, . b)) * (a, ; b,) = (a, ° ay, b) o b,) 5 Where oe. o 


and * are the binary operations in A,B and AXxB » Yespectively; but, for simplicity, 
we shall use a dot (or nothing) to denote all of the operations. If we are using additive 
notation for abelian groups, then we write the direct sum A @B. 

The group A xB = contains the subgroups Axe and exB , Which are isomorphic 
to A and B, respectively (here we are using the same letter e to denote the identity 
in both A and B). The map ~:AXB-7BXA_ given by p((a,b)) = (b,a) is obviously 
an isomorphism between A xB and Bx A. If we have three groups A, B, C, then we 


can speak of the direct products (AxB)xC and A x (Bx C). If we set w(((a,b), c)) = 


= (a, (b,c)), we easily see that 


(Ax Bx c = A xX (Bx C) F 


355 


Because of these properties of "commutativity" and "associativity" of the direct product, 
we can speak of the direct product of any finite number of groups G, ; Gy fp bo0 
n 


write 


without needing to indicate by parentheses in what order the direct product is taken. (Note 
that this makes the set of all groups into a commutative semigroup, whose elements are 


groups, and whose binary operation is the direct product. ) 


THEOREM 7, Let G _ bea group with normal subgroups A and B. it 


AN B=e and AB=G, then G2=AxXB. 


Proof. Since AB=G, anyelement geG canbe written intheform g=ab 
cee! = 
where aeA and beB. If we also had g=a b, ayes, b, eB, then 


eal 
ab=a)b,, and so a, a= bb e AN B=e. Thus, a, =a, b, =b, andwe 


conclude that g canbe written in the form ab in only one way. Furthermore, 
ail. ail, : lena fall : 
AAG>=k=a(ba b j)=aa'€ A; BAG? k=(aba )b =b'b eB, i.e., the 
commutator k isin AM B=e, andso ab=ba. 
We now definea map »:G ~ AxB by setting e@(g) = (a,b) forany g = ab. 
We then have (gg') = w(aba'b') = w(aa'bb’) = (aa’, bb’) = (a,b) (a',b') = 
= o(ab)e@(a'b') = o(g)e(g'). Furthermore, (ab) = (e,e) @a=e, b=e, ie, 


Kerg =e. © is obviously surjective. Thus, © satisfies all of the requirements for an 


isomorphism between G and A xB. (a 


A group G_ which satisfies the conditions in Theorem 7 is called the direct 
product of the subgroups A and B. There is a (somewhat pedantic) distinction between 
this meaning of "direct product" and the earlier meaning, in that now G_ contains the two 
groups of which it is the product; strictly speaking, the group A xB, which is the direct 


product of A and B_ is the earlier sense, is the direct product of Axe and exB 


356 


in the new sense. But usually one neglects this distinction, and identifies A with Axe 


and B with exB. 


Our next result concerns homomorphisms of direct products. 


THEOREM 8 Let G=AxB, and let A, aA and B dB. Then 


Ay x By 4G, and G/(A, x B,) = (A/A,) x (B/B |) . Inparticular, G/A=B. 


Proof. Let #:A-7 A/A, and p:B- B/B, be the natural homomorphisms. We 
define the map »:G- (A/A)) x (B/B,) by setting @(ab) = (7(a), p(b)). We immediately 
verify that is a homomorphism with kernel Ker = Ay x B and image 


(A/A,) x (B/B))- Oo 


Just as in the theory of vector spaces, it is easy to prove that, if G is a group with 


normal subgroups G,,..., G. , then G=Nl G. if and only if G= (G 


iL? ce 


fee 


and ei n {G,,; es or =e forall j. This same fact can also be 


Gir? Sie one ¥ 
expressed as follows: G_ is the direct product of the normal subgroups G) ¢oeo 9 Ce 


if every element geG_ can be uniquely expressed inthe form g = Byer Bis BE G. . 


The direct product of n copies of a group H is called the n-th cartesian power 
n 
and is denoted H =H x...xH. One subgroup of H" which is of special interest is 


the diagonal A = {(h,h,..., h) lh ¢ H}, which is a group isomorphic to H. 


If we omit the condition B<4dG in Theorem 7, then we arrive at the notion of a 
semidirect product: G=AB, AN B=e, A <&G (sometimes one write G = ANB). 
If we define G tobe a semidirect product of two groups A and B , that does not 
uniquely determine G _ until we describe the action of B on the normal subgroup A, 
i.e., in any concrete case we must describe how to compute ba bp? fOr aresAy bre Be 

Many of the groups we have encountered are direct or semidirect products. For 
example, so is the semidirect product of the normal subgroup AN and the cyclic sub- 


group ((12)) oforder 2. Using the notation in Example 2 of Subsection 1 » we can 


357 


write: 7 = Mi x ((123)) = (Z., x Z.,) d Z.3 S, = EN S. & (Z, x Z.) d (Z, » Z,). 


One more example: The group AQ, JR) of affine transformations R- IR (see 
Exercise 3 of §2 Ch. 4) is the semidirect product of the normal subgroup of translations 


and the subgrcup GL(1,IR) of transformations which leave the point x = 0 fixed. 


5. Generators and defining relations. The subject of systems of generators for a 


group G _ was already discussed in §2 Ch. 4. We return to this question in order to look 
at some of the groups we now know from this point of view. It follows from the results of 
Chapter 4 that in the case of cyclic groups it is not necessary to make a cumbersome 


Cayley table. Writing 


eG = Cele = e@) (3) 


€ 
gives all possible information about the abstract cyclic group Cc. of order n; we have: 


= : Sate st S+t-n 
aoe with cSch = (8t if s+t<n and cce=ec 


2 
een fete 
if s+t >n. Wecan also say that any cyclic group isa homomorphic image of the 
single group (Z, +). 


In the same way, the "universal" group for all possible direct products 


A= (a) x... x (a) of r cyclic groups is the r-th cartesian power 
r 


Zo 2e...8 2 (see Subsection 4), which has generators 


0) 


’ 


and addition law 
= = iba p Be0 cheat . 
est XL Oy ee eee ae 
The map z, Pa,, 1<i< r, extends uniquely to a group homomorphism 
i So 
ie : = 
P:(S),85,+++5 8) Pa a a meee with kernel Kerp=m,Z6...©m_Z 


(see Theorem 8), where m, is the order of (a,) if this is finite, and m, = 0 


otherwise. 


358 


In analogy with (3), we can write 


where we always implicitly assume when we write this that the generators Apes > 2 
m m 


r ris 
commute with one another. It is customary to call eS cous ae the defining 
ig ; 
relations for the abelian group A, andtocall Z_ the free abelian group of rank r 


(or with r free generators Zireees ze It is obvious that 


r ah s, 
A= Zz es Seal =e = & S soo 2 8. SE O F 


Now if Fs is any group which is generated by d elements i g.a00 4 fy , then 
any element fe Fy can be written (perhaps in many ways) in the form 
Ss, s s 
ane tae fon i fel, ee ace ae (4) 
Ie k : ] 
where i, # bd » j= 1,2,..., k-1. We can always obtain this latter condition using 
ee ee ae enn iG = Clk 52 5 
al i i j 
If the condition f=eo S) eee = iy O holds for every f written in the 


form (4), we say that F is a free group, generated by d_ free generators. The 


i ell 


elements of F. are often called words in the alphabet iy ' i 5 Oba p ft, 4 fy }. The 


d 
irreducible form (4) of the word f and the length 4(f) = is, | + Is, | foeee yb Is, | 


are uniquely determined, since otherwise we would have an empty word e = ft} (the 


identity element in Fy with length > 0. If d_ is fixed, then two free groups Fy 


and Gy with d free generators fi, sae 4 fy and Bpreees By? respectively, are 


isomorphic: we need only set $(f,) = 85 1 <i<d, and, given any word f inthe 


form (4), set 


359 


{the identities in Fy and G, are denoted by the same symbol). However, if Gy is 
not a free group, then @ will only be an epimorphism, whose kernel Ker @ consists of 
those words f which become the identity element of G, after the substitution i & 8° 
This universal property of Fy (the fact that the substitution f > g always extends to 
anepimorphism @: Fy = Gy whenever Gy is a group with d generators 8) can 
actually be taken as the definition of a free group with d generators, but we shall not 
dwell on this point of view. 


So as not to give the impression of free groups as something mystical, we give some 


concrete realizations of such groups. 


d=1. Fy = (Z,+) is the free abelian group of rank 1, or, equivalently, the 


infinite cyclic group. 


d=2. Let Z@[t] be the’ring of polynomials in t with integer coefficients. In 
the special linear group SL(2, Z[t]) we consider the subgroup F generated by the 


matrices 


Soh ee eT 


We prove that F isa free group. A simple induction on k shows that the element 


a B a 8B 
al ie Pe 
Wig ee fe Bus lh ge We RON ees on 
has the form 
2k 2(k-1) 
be 1+ +o, t GAD +o at ) 
k -1 2(k-1) -1 2%k-1)|| ’ 
‘lade eo eae ) Dp reO o1 % t 


where Oe By eee o B. , and the dots denote terms of lower degree in t. It is 


a 
clear that Wy # E. Any element of F can be written either in the form BP a ; 
a of} OC 

which is # E, orintheform W = BP w, A . If W=E, then Ww. =B aa ; 


which, however, is impossible (compare the degrees if k > 1, and doa direct 


360 


verification if k= 1). Thus, F is free. 
It is also not hard to show by an additional argument that, if we replace t by any 


integer m > 2 inthe above example (i.e., we take the subgroup of SL(2, Z) 


(7 m 


01 I and I? ab , then the resulting group F is still a free group 


generated by 


with two generators. 


Definition. Let Fy be a free group with d generators f, A6bo 9 ul let 


F 
S = {Ws ie I} be some subset of elements w(t), aay f,) € F andlet K=(S 4) 


d 3 


be the smallest normal] subgroup of F containing S (i.e., the intersection of all 


d 
normal subgroups containing 5S). We saythata group G is givenby d_ generators 


Aryerey a, EG and the relations Wi(ajs--+, ag =e, ieéI, ifthere exists an 


» 


epimorphism 97:F,- G with kernel K_ such that m(f,) =a 1<k<d. inthis 


d Is ¥ 


case we write 


GS KA boo 4 | Wwal@ioaeee cue sere i @ ip 
1 lao! 


d 
and wecall G a finitely presented groupif CardI< ©. The group F A itself is "free 


of any relations" (whence its name). Ifa group H with d generators b ,b 


pr’ d 


has the same relations Ww, (b, gooey b,) =e, ieéI, and possibly some other relations 


as well, then H is a homomorphic image of G. In particular, lH| < Ic| , 


Example 1 (the dihedral group). The group G = (a,b le = ae =abab=e) with 
two generators and three relations has order lc| <6, since ba = age ye = 


Soll F fageal 2 
=(a) a b(b) = ab, and it is not hard to see that the elements Spa. a ab, 


ao exhaust G. Since the permutations (123) and (12), which generate S, ; 
satisfy the relation (123)° S (12)" = (123) (12) (123) (12) = e, it follows that the map 
o:Gr S, defined by taking a ' (123) and b & (12) gives an isomorphism G = S3 5 
Thus, the symmetric group S3 is given by two generators and three relations. Recall 


that S3 can also be identified with the group of all symmetry transformations of an 


equilateral triangle. 


361 


The full group of symmetry transformations of a regular n-gon P_ is calleda 
n 


dihedral group, and is denoted De . The rotation 


cos § -sin@ 
sin 9 cos 6 


of the polygon about its center through the angle @ = 2#/n generates a cyclic subgroup 

{G) of order n. Be also contains the reflection § = ls ‘I of P_ relative to an 
e n 

axis passing through the center and one of the vertices. By definition, we have B° =l. 


The 2n distinct symmetry transformations 


ne AG eee AGRO et. Ge atte  24(5) 


exhaust all of Dd. . To see this, note that any symmetry 
transformation is determined by its action on the vertices. 


Ifa transformation takes 1 to k, then it must either 


5 ; ; F ae k-] 
preserve the same cyclic order of the vertices, in which case itis @ » or else reverse 
: : : ee keel 
this order, in which case itis @ = &. Hence, Dd. does not have any elements other 


than those in (5). Note that ®8@ coincides with qe ; since both transformations 


interchange 1 and n. Thus, we have the relations 
G  —- ey wie — e ChGh =e d 


This means that D is a homomorphic image of the group 


2, 
Gs (a,b |a” = eae =e (abe =e) 


il. ll n-1 ; 
As inthe case n=3, wefindthat ba=a b =a b, So that any word in the 


alphabet {a, ae be moh reduces either to io or to alt 7, OS tS we i, eves. 
le] < 2n, and, by what we determined before, we must have an isomorphism G = Do é 
We have thereby obtained the dihedral group in terms of generators and relations. We 
identify G with Do g 


2 
D, = éa,bla” =e, b Sob) =e) 


362 


Since (a) a3 Dd, , and D/(a) isacyclic group (of order 2), it follows by 


2 Le eal 


Theorem 4 that the commutant D. of De is contained in (a). But a =aba b = 


2 
fable D , so for odd n we have ee (a). Foreven n wehave Di/<a = 


CAR) Sy 


4? the direct product of two cyclic groups of order 2, and hence 


Di = 
n 


(a”) . The center 2(D.) and the number r_ of conjugacy classes in dD also 
depend on whether n_ is odd or even. We give the information in the following tables 
(which are easy to verify), in which the representatives of conjugacy classes are in the top 


row and the number of elements in a class is given in the bottom row: 


0 = Pat or i = 
n= 2m. D = fa), D:D) = 4, Z(D.) = (a yy ie = iolcoe 


Ms ie {a), (D:D)= 2, 2D) =e, r= m+2 


It should be emphasized that the form of the relations (the We in the above notation) 
very much depends on our choice of generators for the group. For example, the dihedral 
group D. for n= 2m _ is generated by any two reflections about axes which intersect at 
an angle of m/m. Hence, 

n 
) 


2 2 
D. = (81,5818; a E5 a (8) 8, 


If we use the previous definition, we can set By ab, 8, = b. 


363 


Exainple 2 (the group of quaternions). Unlike in the previous case, we shall define 
the group of quaternions Qs (whose name will be explained in Chapter 9) from the very 


beginning in terms of generators and relations: 
2 ioe 
Oa aces Da = 65 b = a {bab =a) : 


; Sul 
Again we have ba =a “b= ag , and, since p? = Pa » any word in the alphabet 


=k a 
{a,a ,b,b } canbe reduced to the form a’b’, We Ue <1; thus, 
(Ones 
Can we assert that IQ, = 8? Yes we can, but only after exhibiting a group of 8 
elements having two generators which are subject to the same relations as a and b in 


- We claim that the group generated by the following matrices is such a group: 
8 group y & & 


i 70 @ i 
eof SP e- [af oem | 
In fact, 
os te A Sine 
and 
i ve a Gia’ 
caw {ale Sf. af Of 3 af. eft aff. 


The map ah A, bB_ gives an isomorphism Qs = (A,B). Note that 
oe eZ (Qe) , and since the quotient group of a non-abelian group by its center cannot be 
cyclic (see the remark in Subsection 2), it follows that Ge = Z(Q4g). All groups of 
order 4 are abelian, and we have Q3/ Z(Qg) = V,, the direct product of two cyclic 


groups of order 2. Thus, the commutant Qs. coincides with Z(Qg) » and (Qe :Qg) =4, 


The information concerning conjugacy classes is contained in the following table: 


364 


Finitely presented groups, some of the simplest of which we have just considered, 
occur in many areas of mathematics, for example, as the so-called fundamental groups of 
manifolds. It is not surprising that there are still many questions concerning such groups 


which remain open. 
EXERCISES 


1. Recall the definition in Subsection 2 of §3 Ch. 4 of the inner automorphism 
L :g bP aun” and the group Inn(G) © Aut(G). Showthat Inn(G) 3 Aut(G), and 
that Inn(G) = G/Z(G), where Z(G) isthe center of G. The quotient group 


Out (G) = Aut(G)/Inn(G) is‘sometimes called the group of outer automorphisms. 


2, Let H and K_ be subgroups ofa group G. Show that lInK| . ln al K| = 


= lH . IK| (this is analogous to a well-known formula in linear algebra). Next show that 
the set HK isa subgroup if and only if HK = KH; if K<4G, then this condition 


automatically holds. 


Se Show that, if G isa finite solvable group, then there exists a chain of sub- 


groups E=G,¢ Cs cee S ee =G, where G, ,4G and each index 


i-l a Ses o 


? 


0 


(G.:G,_,) =p, isa prime number. 


4, Compose for the symmetric group Sy the table 


pp os tele] | 


which is analogous to the one used to prove Theorem 5. Using the same reasoning as in 


the proof of Theorem 5, repeat the description given in Example 2 of the normal subgroups 
of 6 
S4 


5. Prove that the alternating group AL » where n> 5, is simple by filling in 


365 


the details in the following outline. 


a) If K#e isanormal subgroup of Ay , take a permutation #7 #e which 
leaves fixed the maximum possible number k of elements of Q = {1 pEgase gp ibe We 
Keo tener = (7k) sand) Ke AL (see Exercise 8 of §2 Ch. 4): so 


suppose k<n-3., 


By) ty = (BB soc Neon Te tine decomposition of # into disjoint cycles, then the 


fact that m isevenand k<n-3 implies that k <n-5S. It is also possible for 


m = (12)(34)... tobe a product of disjoint cycles of length 2. 
A 5 Sil Olk 
c) In either case consider the commutator [7,c¢] = won 0 # e with 
o = (123), and verify that it leaves more than k points fixed. This contradicts the 


choice of k, and proves our assertion, 
6, Show that Z(A xB) = Z(A) x Z(B). 


Ws If Ki, K, «a G_ and KX, qn K, =e, then G_ is isomorphic to some sub- 


2 
group of (G/K,) x (G/K,) . Is this true? 


8. Let K 4 G=+AXB. Prove that either K is abelian, or else one of the 
intersections KM A, KNB isnon-trivial. Give an example of a group AXB witha 
non-trivial normal subgroup K for which KN A-=e and KN Bee. Thus, 


K <4 AXB_ does not necessarily imply that K = (Kn A) x (Kf B). 

oF Is the quaternion group Q3 a semidirect product of two of its proper sub- 
groups? 

10, Prove that H < Qs for any proper subgroup HC Qs 4 


il, Show that the groups Dy and Qe are not isomorphic. 


12, Show that Aut(D,) ®D,. (Since [z(D,)| = 2, by Exercise 1 this then 


means that [out (D,)| = 2.) 


366 


13. The set of all complex p th roots of 1, i=0,1,2,..., forms an 
infinite group C(p ) . It is called quasi-cyclic, since any finite number of elements 


generate a cyclic group. Verify this, and show that 
Cip ) = tal, 4..4 laP. Pos @ le i,4,3 ) 
Pp = 12 “Ae Se) oo joes 2 = a) Peds teas jonta 2 


3 pent 


14, Let G = ¢a,blaba = eT a = © =e}, where ne N. Prove 


3 : 
that n=1, ie., that b=e andso G= Cala =e) is actually a cyclic group of 


order 3. 


15. Fill in the details in the following formal definition of the free group ae with 


=i) all 


n generators. Take the alphabet A = {a,, a a = which consists of n 


ieee ee n? 


1 1 
letters a ais and their "opposites" a, ,..., a> and add the symbol e. 


poo 
Let S_ be the set of all "words" obtained by writing out these 2n+1 symbols in any 


order, with any possible repetitions, in a row of finite length. We define the product uv 


of two words u and v_ tobe the word obtained by juxtaposing the word v after u. 


4 1 m 
By the inverse of U= a wee a, ce atl, keeles We meansthe word 
1 m 
213 oe 
Al m 1 eal P : ; 
uo = a eee a ; @ =e. We introduce an equivalence relation ~ on S as 
m 1 


follows. ‘Two words are considered to be equivalent if one is obtained from the other by 


applying finitely many of the following elementary transformations: 


GO = €, 
il a 
Aa, = Oy a. = © » 
11 1 1 
Blo eS el me ee 
fl. 
i ie il i ? 
al cal 
el, Sf Gl. 5 ea rt aks . 
i j i i 


' 


Each equivalence class has a unique “irreducible” (shortest) word. Multiplication 


(juxtaposition) of words induces an associative multiplication operation on ~ -equivalence 


367 


classes (and we similarly have inverses defined for equivalence classes). The identity will 
be the equivalence class of the “empty” word e. We finally define our free group Fh to 


be the set of equivalence classes with this operation. 


Example. A cat with a spool of thread runs in “figure eights" around two columns, 
each time carrying the thread above the previous loop of thread. (The cat runs in different 
directions when it comes to the point between the two columns, not necessarily in a 
continuous figure eight.) The possible paths of the cat starting and ending at the point 
between the columns are viewed, naturally enough, as elements of a free group F, with 


2 


2 generators. The irreducible words 


abba p71 


Fig. 19 


=a): ml =I =rlk 


correspond to paths without “retracing of steps", suchas aa pe Gly Joie) doy 10) 5 


The segments a and ia » and b and no are not drawn right on top of one 
another in Fig. 19 only for visual clarity. Our example amounts to realizing a as the: 
set of classes of “homotopically equivalent paths" (to use the topological terminology) of 


the lemniscate, Inthe same way, the fundamental group of Fig. 21 atthe end of §3 Ch. 8 


is the free group F, é 


368 


84, ‘The Sylow theorems 


In Subsection 4 of §3 Ch. 5, we observed that a finite group G of order fea 
might not have a subgroup of order d evenif d_ divides lc| . The simplest example 
is Ge Ay , d=6. 

Since there can be no subgroup of index 2 in a non-abelian simple group (since such 
a group would be normal), it follows by Theorem 5 of 83 that the alternating group A. 
has no subgroup of order 30. Actually, neither does A, have a subgroup of order 20 
or 15. (Why not? Use the arguments in Example 2 of Subsection 3 82.) Because of 
these examples, it is especially remarkable that, more than a century ago, the Norwegian 
mathematician Sylow was able to prove a series of general facts about subgroups which exist 
in any group. These facts concern p-groups (which we already encountered in §2) 
which occur as subgroups of a given group G. 

Let lc| = pom » Where p isa prime number and m_ is an integer not 
divisible by p. A subgroup PCG of order |p| = p- (if such a subgroup exists) will 
be called a p-Sylow subgroup of G. Asin Subsection3 of §2, welet N(P) denote 


the normalizerof P in G. 


THEOREM 1 (the first Sylow theorem). There always exists a p-Sylow subgroup. 


THEOREM 2 (the second Sylow theorem). Let P and Py be two p-Sylow sub- 


groups of G. ‘Then there exists an element aeG_ such that Py =a eee - In other 
words, all p-Sylow subgroups are conjugate. 

THEOREM 3 (the third Sylow theorem). The number as of p-Sylow subgroups 
in G isequalto (G:N(P)) and satisfies the congruence ae = 1 (mod p). 

The proofs of Theorems 1-3 illustrate the general methods and concepts that were 


presented in §2. We begin with Theorem 2. 


Proof of Theorem 2. Suppose that G actually has p-Sylow subgroups, and that 


369 


P is one of them. Now let Pi be any p-subgroup of G, not necessarily a p-Sylow 


one. We make eB act by left translation on the set G/P = U g,P of left cosets of P 
1 


in G (this is a restriction of the action of G on G/P given in §2). According to the 
results of Subsection 2 §2, the length of any orbit of this action divides the order 


[P, | ne k <n. Thus, 


n k k 
G 1 2 
m -eB_ i. Ic/P| =a ae (0) t+ cee ; 
p 


k k 
1 2 
Where p ,p ,... are the lengths of the orbits. Since g.c.d.(m,p) = 1, it follows 


k, 
that at least one orbit has length p y= pele Cnn, 


iP Dele = eye) (1) 


for some element a = 8B € G (this argument resembles the proof of Theorem 2 of 82). 


Rewriting (1) in the form 


we conclude that 
-l 
Py Cc aPa (2) 


(since aPa : isa group). In particular, if P, isa p-Sylow subgroup, then 


[p| = [P, | , and (2) implies that P, = Beet weal 


Proof of Theorems 1 and3. ‘Theorem 1 is a consequence of Theorem 3, since 
eh = 1 (modp) implies Ds # 9, and hence the set S ofall p-Sylow subgroups of 
G isnon-empty. So it suffices to prove Theorem 3. 

First, the equality Ns = (G:N(P)) follows immediately from the fact that all 


p-Sylow subgroups are conjugate (Theorem 2) and the general fact about the length of the 


orbit Be that was given in §2. We shall arrive at the congruence De = 1 (mod p) by 


370 


considering a somewhat more general situation, Namely, let lel = pt ,» where s<n 


(t may be divisible by p), and let Le (s) be the total number of subgroups of order 


p> in G. We shall see that we always have the congruence Ney 


Ml 


1 (mod p); in 
particular, G contains subgroups of any order p° 5 SBS ls A50e09 Mo Mewe wml 
N (n)=N_. 
Pp , Pp 
To prove this congruence, we argue as follows. The action of G_ on itself by left 
translation induces an action of G_ on the set 
8 
a= {mc G||M] = p'} 
of all p>-element subsets eee goceg & a (see the remark at the end of Subsection 1 


Pp 


82). Recall that g TI peee »g At = {gg, 5+: » &g ae The set Q splits up into 
; p p 


G-orbits a, 2Q=0U Q, , so that 
1 


lal = 3) lal, lal = Ge) , 


where G, = {geG | eM, = M, } is the stationary subgroup of some representative 


M, € Q,. 
i i 


v 
Since GM, = M, , it follows that M, = U G8; is the union of a certain 


number of right cosets of G;- Hence, p° = [M, | a Ic,| » so that Ic, | =p : < a 


s-s 


In the case Ic, < p> » we have i, | =p A = 0 (mod pt); while Ic.| = p> if 


and only if la, | =t. We obtain 


Ic| 


= lal = oe la,| (mod pt). (3) 


From what has been said, we have |Q.| =t G._|| = 2 = = 
la,l =e Io,| = p°= m= Ga, @ = 8, 


; al : s 
is some element in G), and hence a M, = G,a, = P isa subgroup of order p . 


371 


The orbit Qa: consists of the various cosets BP, of P, : 

Conversely, every subgroup H¢ G_ of order lH| = p- leads to an orbit 
ais {gHlg eG} of length t. Different subgroups He with lH, | = ar lead to 
different orbits 2: , Since if we had Hi. = gH, 


, then we would have e = gh, for some 


a € sy » which gives geH, and Hy = e - Thus, there is a one-to-one correspondence 


j 


between subgroups of order p- and orbits Q, of length t. The congruence (3) can 


now be written in the form 


) = a. ia | = eyes) (mod pt). (4) 
p IQ, =t 


So far we have used nothing about the particular group G_ except for its order. If 
we take G_ to be the cyclic group of order pt , then the iy (s) forthis G would be 


1 (Theorem 5 of §3 Ch. 4), and sd we must have 


lc] 
D> 


= t+ (mod pt) é (5) 


Since the left sides of (4) and (5) coincide (i.e. 3 le] for our original G isthe same 


as le] for the cyclic group of order D> t), it follows that 


a ae (mod pt) , 


which gives us the desired congruence Ne = 1 (mod p). @ 


Although we have actually proved more than stated in the theorems, we shall not have 
occasion to use any deeper results, and so refer the interested reader to more specialized 


books. 


Example. Let G = SL(2, =) be the group of all 2x2 matrices with deter- 
minant 1 over the field Zz of p elements. If we write the general linear group 


GL(Q, a) as a union of cosets of SL(2, Z) : 


372 


ut al 
= 2a 
GL(2, Z) ae 9 1) SEG, = 
we see that 
IcL@, 2)! = (p- DISL@, Z| . (6) 


If we think of GL(2, Z)) as the group of automorphisms of a two-dimensional vector 
space V_ over 2 , we can easily find lene Z.,)| - Namely, GL(2, ms) acts on 


the set of bases uve ; by . The image of v, can be any non-zero vector fi eV (there 


1 


2 
are p -1 such f) , and given a choice of f , the image of v, canbe any vector 


2 
f, in Vict) (there are p-p such vectors). Hence, IGL@, %,)| = 


2) 
2 2 : : : : 
= (p -1)(p -p), which, combined with (6), gives the formula 


sua, z)| = p(p- - 1) 


We can exhibit at least two p-Sylow subgroups of SL(2, =a) right away: 


— HE . jee Bat B= jeez, 


By Theorem 3, we have 
Me = (Ge Ni(e}) = il <b kj & il . 


tL) 
a 1 


Since 


, 0 1 allie 0 Tce 


i] 


of order p(p- 1), _ it follows that the only possibility is 


NP) Heep 


373 


With p=2, wecan obtain an isomorphism between the symmetric group S3 


and the group 


seat i [al Rab bale Bp. af 


IT 3 ial 


GG) ae Coy I° ; 


(note that both groups are given by the same generators and relations). If p> 2, the 
ELOup eG = SL. z has center Z(G) = {+E} of order 2. The quotient group 

BS C23, 25) = G/Z(G) is called the projective special linear group (it is the group of 
transformations of the projective line Zi. op eee (He 0. e, pol) Utell. has 
played an important role in algebra since the time of Galois, Jt turns out that, when p>3, 
PSL « a) is a simple group, and provided, along with AL , one of the earliest known 


examples of a finite simple group. 


We return to the general situation, and give a useful refinement of Sylow's theorems. 


THEOREM 4. (i) A  p-Sylow subgroup P of G isnormalin G if and only if 


n n 
1 k 
(ii) A finite group G of order is || = Py cee Py is a direct product of the 
p, “Sylow subgroups Py pace 9 P. if and only if these subgroups are normal in G. 


Proof. (i) By the second Sylow theorem, all of the Sylow subgroups corresponding 
to a given prime divisor p of re are conjugate, and, if P is one such subgroup, 


cal 
then BL Sess =P, VxeGe PatG. 


(i) lta Ga P xX... X P, is the direct product of its Sylow subgroups, then 


k 


P, 2G, since it is a direct factor in G. Hence, normality of the Sylow subgroups is a 
i 


necessary condition. 


Now suppose that P, 4G, l<i<k, ie, Ne = 1. Inthe first place, we 
i SS 


374 


note that 


s G 
B; Pj 
GE A a ate ane eer =e, x Fe xe=e . 


j 


Hence, P. nN P. =e, and so for any x, € P 5 5 € El we have 


j 


Sil. il Sik 
XX ie Riese € IB 
sr i j yd j 
[x,,x,] = =—=—> Leeeg l= @ 4 
( a) = So, © IP 
se a ae eae i 


i.e. , the elements x and 4 commute with one another. 


Suppose that we could write the identity element e¢G inthe form 


b, 
By Vg Ves where y,« P. is an element of order eS on . Setting 


a= TI a and using the fact that the Le commute with one another, we obtain 
if j 
opt ‘s 2 aoa 
¥1 Vo ore Wie at yy Vo SS ¢) yy re yj 
a. 
But, since a and ml are relatively prime, the fact that i =e and a =e implies 
that Yj =e. This istrue for any j, andso e= YiVo cre Vy is only possible if 


ve Sh et 


On the other hand, any element xe«G oforder r= Pity eee TL, TE =P, » can 
i i 
be written in the form 
X = X)Xy eee XM, kx l= 2, ers ges 1 (7) 
E, oe 


; F ii 
To see this, it suffices to set A Ses » where t and nr are determined from the 


conditions: 
k 
i = se/ ie, l= ) ie 2 
i / aL : a i 
i=1 


Now if x = xy x) ago x is another expression for x inthe form (7), then, since the 


375 


x, and x with different subscripts commute, we have 


ry! -1 ol =] = 
e= ee eae = e U M 
x) Xy x) (x) Xo x) x x) Xy Xo Ries xX x ‘ 
which, as shown above, implies that x' a = 3d a = eee ome te 
’ ’ p il = 5 eas = XX = 25 theo. 
Se =X Sex pel oe 


Thus, every element in G_ can be written uniquely in the form (7) y ie Gx,. (See: $3) 


we have Cees QO 


Remark. Anormal p-Sylow group P is invariant under the action of any auto- 
morphism @ ¢€ Aut(G), since ko (P) | S lp| , and so (P) is alsoa p-Sylow sub- 
group; hence, o(P) =P if me = i, 

In conclusion, we note that analogs of Sylow subgroups have been studied in algebraic 


structures which are very different from finite groups. 
EXERCISES 


1. Find the number of 5-Sylow subgroups in As E 


2. Verify that the set P of matrices 


ef ah. =f ap 


over Z., is a group isomorphic to the quaternion group Q3 ,» andthat P isa 2-Sylow 


subgroup of SL(2, Zin) « Show that P 4 SL(2, Ze). 


3. Show that the groups Sy and SL(2, Z.) are not isomorphic. Are the groups 


PSL(2, Zz) and A, isomorphic? 
4, Prove that every group G oforder pq (where p and q are prime 
numbers, p <q) is either cyclic, or else non-abelian with a normal q-Sylow subgroup, 


and that the second possibility can occur if and only if q-1 is divisible by p. For 


376 


example, all groups of order 15 are cyclic. 


5. Give a new proof of the congruence (p-1)! + 120 (modp) for p a 
prime (see $1 Ch. 6), by making a direct computation of the number ae of p-Syiow 


subgroups in the symmetric group Me 
85. Finite abelian groups 


In an abelian group all subgroups are normal. This obvious fact, together with 


Theorem 4 §4, immediately imply that any abelian group A_ of order 


3 ee "k 
|A| =P, Py vee Py is the direct product of its Sylow subgroups A (P,) 5 


Aes A(p,) x A(p,) M o60 A(p,) : (1) 
The factors A(p,) are often called the primary components of A. The direct product 


expansion (1) is unique: each component A (P,) is simply the set of all elements whose 


order is a power of Pi: 

Our purpose is to express the abelian group A as a direct product of the 
simplest possible groups, i.e., cyclic groups. If we do not make any restrictions on the 
orders of the cyclic groups, then we can generally do this in many ways; to take the 


simplest example, 
6 3 2 
ASE = eS UE) ee) 


But we only have a limited choice, and, despite the non-uniqueness, the final result 


(Theorem 3) gives us a very useful picture of the nature of finite abelian groups. 


1, Primary abelian groups. In what follows we must keep in mind that, if an 
abelian group A _ is generated by subgroups B and C, thenactually A = BC; in 


addition, A =BXC ifandonlyif BM C=e (see Subsection4 of 83). 


377 


Unlike other cyclic groups, the cyclic group C A of prime power order p" 
p 


cannot be decomposed, i.e. , cannot be written as a direct product of groups of smaller 


order. To see this, write C = (a) and C fae (aP ) . Then the chain 
p p 


contains all subgroups cf C ae Any two of them X #e and Y#e_ havea non-trivial 
p 


intersection KY > a , and so cannot be factors in a direct product expansion. 


THEOREM 1. Every finite abelian p-group is a direct product of cyclic groups. 


Proof. We use induction, and suppose the theorem true for all abelian p-groups 
of order < Dp" - Let A_ be an abélian group of order lA | = by » and choose an 


element a # e of maximal order pat - Consider the quotient group A=A PRED c 


Since la| = aoa < pb" , it follows by the induction assumption that 
A = A oe A 2 
a Kae (2) 
m, 
— = -1 
where A, = (b> = (b, (a?) = {(a),b,(a),... ; be (a)} is a cyclic group of 
a 
@ideis jp “4 ee et, m ters tm so-m. By definition, 
m m., 
—. tt = i Ss 
bP See aie. bP Smet we, (3) 


and, although every element xeA_ has the form 


in general this expression is not unique. We must "correct" the elements b, € A in such 


a way that the exponents s; in (3) vanish. This is not hard to do. If we recall that 
m-m, 


1 5 
m, < m_ and raise both sides of (3) tothe power p » Wwe obtain 
7 


378 


so that s,=t Pp ‘ (by Theorem 3 of §2 Ch. 4). If we now set Le ba , then 


lee te ee fay i Ca =e), (3") 


where a, = a (a) = b, (a) = b, , and consequently (a) = A, . Once more, we have 


for any xeA, and this expression is now unique. To see uniqueness, note that if there 


were two such expressions for x, we would obtain a relation 
m 
O < wm = p 


(not all of the Vv; and v vanish), whichthe epimorphism A ~ A would take to the 


v, 


: =a = , ae: te 
relation ay see a =e. Because of the direct product (2), this gives a =e, 


v. 
l<i<pr, or, equivalently, a. € (a). But, by (3'), this can only happen if all of the 


vee Oo, then also v=0. 


This contradiction shows that A = (a,) Kee. (a) so Ce oO 


Remark. This proof of Theorem 1 resembles the geometric proof of the theorem 


on the Jordan normal form of the matrix of a nilpotent linear operator (see the Appendix). 


THEOREM 2, If a finite abelian p-group A _ is expressed in two ways as a direct 


product of cyclic subgroups: 


379 


then r= s, and the orders of the Ay coincide with the orders of the B, fora suitable 


ordering of the ci . 


roof. The theorem obviously holds if lA | = p. We use induction on la| . Itis 
convenient from the very beginning to order the Ay and Bi in such a way that their 
orders are non-increasing; 
m, 
ean Kal = po 
i ee i Y 
(4) 
. = ° = = 1 
m, 2 m, 2 > a = el m : 
& | ie j 
Be = (bj), ol =e , 
(5) 
ple oe ee 1 
The relations 
el -1 
Cn ee) eon) we 


which hold in any abelian group (see (3) in §1 Ch. 4), imply that the set 
AP = {xP lx e A} 


of p-th powers of allelements of A isa subgroup of A, and it does not, of course, 


depend on the direct product expansion of A. On the other hand, if 


i i i j j j 
1 q r 1 t s 
= = ao I ano 10) 
ay eee ag a x by Ms . ; 


then, using (4) and (5), we have 


i i i 
(a)? aye (ab) ce (bP) ae cP) * 


Hence, 


380 


m,-1 ini = 

— _— iL a 

where a,=a,, b, = be are elements of order p and p ! , respectively. 
i i 


Since [AP | Be lA| , it follows by the induction assumption that q=t and m, -1 = 


1 
ae sooo 9 ae l= Wi 1; hence, mM) = sees, = ae If we further note 
that 

iexg] Sct 
= x | = =t 
oe x x A. p ; +1 — p ’ q ’ 
we find that 
Tae leas r-q Ta eaeee 8-4 
p p = [Al = p p 
Hence, s =r, and all of the assertions in the theorem have been proved. o 


m ~ om 
1 r : 6 ; 
The orders p ,...,P of the cyclic factors are called the invariants (or 


elementary divisors) of the finite abelian p-group A. If two abelian p-groups <A and 


B have the same invariants, then 


and the set of isomorphisms oF A, - B induces an isomorphism o:A 7 B_ given by 
o((a, Seas a) = ((a,), roe DO. (a) - Hence, Theorem 2 says that the group A is 
determined up to isomorphism by its invariants. In particular, we have the following 


COROLLARY. The number of non-isomorphic abelian groups of order p is equal 


tothe number p(n) of partitions 


n=n n cee > Soco & 
ae oe op Ate agua te ei, lerem .; oO 


We encountered the partition function p(n) when we were describing the conjugacy 
classes in the symmetric group Sin (see Exercise 4 of §2). An abelian group of order 
ie ie 4 ; 
p  withinvariants p,..., pis often called an elementary abelian group. An elementary 


abelian group is characterized by the property that ee e. If we switch to additive 


381 


notation, we note that an abelian group A for which pA = 0 (where p isa prime) is 
a vector space over the finite field of p elements. To see this, think of the 
elements of os = Bs as residue classes k modulo p for ke Z, and set 

ka = ka » ac¢€A. This gives an action of He on A which makes A_ into an 

ee cere space. The action is correctly defined, because, if k=k , then (k-k')a 

is ofthe form £(pa) = 0. The expansionof A asa direct product of cyclic subgroups 
corresponds to an expansion of A considered as an Pcs space asa direct sum of 


one-dimensional subspaces (using the basis theorem). Thus, 


The amount of choice in the one-dimensional subspaces even when r = 2 is clear from the 


example in §4: zs has p(p+1) different expansions. 


2 


2, ‘The structure theorem for finite abelian groups. Using the unique expansion 


(1), along with Theorems 1 and 2, we immediately arrive at the following fundamental 


fact about abelian groups. 


THEOREM 3. Every finite abelian group A _ is a direct product of primary cyclic 
subgroups. Any two such product expansions have the same number of factors of each 
cyclic order. O 


Borrowing the vector space terminology, we say that elements Ayserey a of 


orders qd, Foon y al , respectively, forma basis of an abelian group A_ if every element 


xeé A can be uniquely written in the form 


Of course, in that case we have 


Lee GCN) Nenog 88 8) lA| = iL Cares eee (6) 


382 


Theorem 3 is equivalent to the statement that every finite abelian group A _ has a basis 
whose elements are primary (i.e. , their orders di are powers of prime divisors of lA l) ; 


and that the set td, ,d o 4 an does not depend on the choice of basis. Because the 


gree 
qd; only dependon A, and not on the choice of basis, they are called the invariants or 
elementary divisors of A, just as in the case of primary groups. We sometimes say that 
{d, ce, dt is the type of the finite abelian group A. 


Given a finite abelian group A , we shall write out all of the invariants in rows 


corresponding to the different prime divisors of [A | » as follows: 


SiH) eee me: x . 

2 el 1 oe Vie ie ean 
ee eee z ‘. - . 

re = ey eg ame oe ety SI a 2s ; 
1 Ka KS . is ~ 
ee ere re ca a : 


We may take all of the rows to be the same length £ if we fill in some of the rows with 
ones. 


The integers 


are called the invariant factors of A. By construction, we have 


[A] = mm, elon mo» ae Bae iS US Aso py eee al - (7) 
The expansion (6), written in the form 
A = 660 
(ere x x a) @ son & (Gy @ soo (a>) . 


gives us an expansion 


A= ue x Os) yee 4 {u,) (8) 


383 


whose cyclic direct factors have order mM), ™M,y,-+-, m,. To obtain the expansion (8), 


it suffices to set 


Be oe ee Leys f 


y 


and use the proposition at the end of Subsection 3 §2 Ch. 4. 

If A isa primary group, then the direct products (6) and (8) are obviously the 
same, but in general (8) is more economical than (6) (4 < r< k4£). Note that the 
expansion (8) immediately gives an element uy of maximal order m,- The integer 
my is called the exponent of the group A. An abelian group A_ is cyclic if and only if 
its exponent is equal to [A | b 

Finally, note that we can always find an abelian group with any given invariant factors 


m,: we need only take the direct sum of the cyclic groups an 


1? Moree, M, yeery ZL 


1 4 


As an example, let us compute all abelian groups of order 16 and oforder 36. 


4 
| ae ae 
4 
Z,02@,, Z,9%4,02,, Z, = ez, OU, OZ, .« 
cf _ ef 2 elementary | invariant 
lis 85 Ves divisors factors 


= i 
Z,®Z, @ Z, Z,,® Z, ; 
= 6, 6 
Z,®u, OL, 6 Z, Zo Z 
Let us consider one more example. We write the group Zo, ®@ Ze, in terms of 


invariant factors. We first express each cyclic summand in terms of cyclic primary 


components: 


= rome Z cf 
ee ioe ee = 


Next, we gather together all of the primary components: 


384 


= (Z. ® Z,) 2) (Z, @ Z,) a Zz. 


(this is a direct sum of p-Sylow subgroups). Finally, we take the cyclic summand of 
highest order in each primary component, and repeat this process with the summands that 


remain: 


= Z = ad 
Zo, ® Ze, (Zo 6 Zy © Z.) © (Z, @ Z,) Z. ® 


504 a 
Note that we would get the same result if we had started with % 36 ® % 68 ; thus 
7h eeeice vies 
(strictly speaking, we should use = insteadof = here). In particular, note that the 
exponent of both groups is 504. 
EXERCISES 


1. Prove Theorem 1 and the first half of Theorem 3 without passing to quotient 


groups. 


2. Obtain an expansion for a finite abelian group A_ into a direct product of pri- 
mary components without using the Sylow theorems, and of course without using Theorem 3. 
In particular, Example 1 in Subsection 1 83 or the proposition in Subsection 3 §2 Ch. 4 


can be used to obtain the expansion 


Wee MS Go Gh oan te |, = P (where P, are the prime divisors). 


3. Show that, if A isa finite abelian group, then A _ has at least one subgroup 


of eachorder d_ dividing lA | (this is a converse to Lagrange's Theorem). 


4. Show that, for a suitable ordering, the invariants of any subgroup of an abelian 


385 


group are divisors of the invariants of the group. 


5. Prove that, if A@A = B@®B, where A and B are finite abelian groups, 


then A=B. 


6. Prove that, if A,B and C are finite abelian groupsand A®C =B6C, 


then AB. 


7. Show that an abelian group with invariant factors m m, cannot be 


ei? 


generated by fewer than £ elements. 


8. Show that a finite abelian group whose order is not divisible by the square of any 


integer greater than 1] must be cyclic. 


9, List all non-isomorphic abelian groups of order 72. 


e 


, Fo? 
10. Are the groups Zo @ Zo, and Zi ®@ Z 49 isomorphic? 


Chapter 8. Elements of. 
Representation Theory 


Before giving the precise definitions of representation theory, we shall discuss two 


problems which are similar in spirit, 


Problem 1. Inthe (m+ 1)-dimensional vector space Vie consisting of real 
homogeneous degree m_ polynomials 
m 
f(x,y) = ag x + a,x ae Goo apf) -1Y ap Ay 


(or rather, polynomial functions (x,y) & f(x,y)), Wwe consider the set of solutions of the 


two-dimensional Laplace equation 


2 2 
i Se _ 
a A (*) 

ax ay 

ae a 
(see Exercise 9 in §1 Ch. 6). The Laplace operator A = Saas 5 is linear: 
AX ay 
A(af + Bg) = wAf + Bag, Va, Be R 


Hence, the solutions of equation (*) form a subspace Ha of Mee . We immediately find 


387 


iar 2 
af = >> [(m-k)(m-k-La, + (k+2)(k+ Na, )x™ 2K yk 
k= 0 IK ene 
Consequently, 
Af= 0 <=> (m-k)(m-k- la, + (k+2)(k+l)a, = 0, O<k<m-2, 


and all of the coefficients a. can be expressed in terms of two of them, say, ay and 


a Thus, dim An <2 


1° 
But it is possible to give two linearly independent solutions right away. Namely, if 


we extendthe <A operator by linearity to polynomials with complex coefficients, then we 


have 
ante : ae Wee ; = 
A(x+iy) = m(m- 1) (x +iy)” 4 imi(m-1)(x+iy)™ ee 0, iF = -l 
Separating the real and imaginary parts, we obtain 
= ( ° m ° 
Zo (89) ee x +iy) = WV) ate ivy) ? 
so that 
Mi = ie =] he SS OSS MS Oa : 
m m 
Thus, 


He $i er¥) » VY) ap 


If we now interpret x and y as the coordinates of a vector in the Euclidean space 
2 4 : ; 
IR with a fixed rectangular system of coordinates, we can see what happens under an 
; ; fs Aue 
orthogonal change of coordinates, i.e., when the plane MR is rotated about the origin 


through an angle 6: 


x’ , (x) = xcos@ - ysin@ , 


v= % (y) x sin 6 + ycos 6 


The chain rule of calculus (which is easy to verify directly for polynomials) gives 


2 2 2, 2 
2 
35 = Sh cose - 2 2S cos@ + sin@ + 2 sin’ g 5 
3x’ ax v ay 
2 2, 
Bohs oe ee 2 AN Be 
—_ a ae SOT ys Ace 
ay AX AX AY 2 
ay 
and hence 
af Bi af 3 f 
Oat nee = mes 
ax éy’ 3x ay 


This means that the equation (*) is invariant under an orthogonal change of variables, or, 
to put it another way, under the action of the group SO(2) = {Et - In particular, the 
polynomials os (x', y') and ae (x', y') are solutions of (*), and so can be expressed 
as a linear combination of a (x,y) and aS (x,y). Thus, the group SO(2) acts on the 
Space of solutions of the Laplace equation. We call this a two-dimensional, real, linear 
representation 


(m) 


Gy 2 J > om) (8) 


of the group SO(2). 


If we return once again to complex polynomials, we notice that 


x’ +iy' = en aL hee = oe ayy 


(x! + iy)” = CNEL + iy) 


Letting the complexified linear operator a™ (9) keep its earlier meaning, we have 


(m) ; : im®@ 
@ " (8): Zo -—_ Zo © bie 
The so-called one-dimensional unit i mn) k> pm 
ensional unitary representations Oy S 5 Wl G 74 Ot 


the group SO(2) play an important role in analysis. 


We note that the action @ induces an action of SO (2) onthe whole space V 


. 
’ 


from this point of view, a is an invariant subspace of V_. 
m 


389 


Problem 2. Estimating the number of organic compounds, for example, in the 
chemistry of cyclic hydrocarbons, leads to the following general problem: How many 
different necklaces of length n can be made from (an unlimited supply of) pearls of q 
different colors? 

Following Pélya, we shall attempt to answer this 
question by first supposing that the necklaces are oriented, 
i.e., a necklace and the same necklace turned upside-down 
are counted separately. Note that the number of possible 
sections of thread with n pearls is equal to qv (the number 


of words of length n inthe free group with q _ generators). 


The cyclic group (0) oforder n_ with generator Fig. 20 

@ = (12 ca4 Wie s acts on the set, 2, of these pieces of thread by permuting the pearls 
cyclically on the thread. It is natural to think of a necklace as the (e)-orbit ofa section 
of thread, or, if we want, as a certain set of concentric circles (see Fig. 20). The second 


interpretation is easiest to visualize. It is connected with the isomorphism 


2m _ 2” 
cos a -sin 7 
$25 = Gio) = oe esl aes 
olol oS COS 
n n 


which we have encountered before, and which we will soon be calling two-dimensional linear 
real representation of the group (0). The number r_ of necklaces can now be expressed 


by the formula in Exercise 8 §2 Ch. 7: 
fot Sal 
1 k 
ie SS > N(o ) 6 
" k= 0 


If d [n , then the element ae of order n/d_ leaves fixed the sections of thread (and 

necklaces) which can be divided into d periods of length n/d (in this connection, see 
d d 

Exercise 12 of §2 Ch, 4). Hence, N(o )=q , and, more generally, 


k 
N(o*) = q® et) . Exactly (n/d) (where isthe Euler function) of the N(o ) 


390 


in & Neo have g.c.d.(n,k) = d. This means that 


1 d 
r=-),o@q 
n d 
din 


If we are interested in physically different (un-oriented) necklaces, we must identify elements 
in Q, by means of a two-dimensional linear representation of the dihedral group Dd. 5 
We leave this for the reader to do on his own. 

Not only in these contrived examples, but in actual physical problems linear 
representations of groups inevitably arise as the reflection of some symmetry in the problem. 
The ideas and language of representation theory are very natural. In fact, the examples we 
shall give in §1 relate to old problems and at first glace do not seem to be anything new. 

But the very fact that all of these examples have been brought together "under one roof" 
should suggest that we are dealing with a concept having fundamental importance. 

Representation theory has two aims: (1) in pure mathematics, the development of 
new techniques for investigating various groups, and (2) in applications, as a powerful tool 
in such areas as crystallography and quantum mechanics, In this chapter our concern will be 
to say something substantial about representation theory using only the material that is 


already accessible to us from linear algebra and group theory. 
§1. Definitions and examples of linear representations 


1. Basic concepts. Strictly speaking, we have already worked with representation 
theory, when we studied the action of groups on sets (82 Ch. 7). We now take our set to be 
a vector space V ofdimension n overa field K » andinthe group S(V) of all 
bijective set maps V- V_ we consider the subgroup GL(V) of all invertible linear 
operators on V_ (i.e., the group of automorphisms of the vector space V). Clearly, 
given a basis te, ecg eit of V, the group GL(V) can be identified with the usual 


matrix group GL(n, K), the group of automorphisms of the vector space Kaa Then to 


391 
every linear operator Ge GL(V) there corresponds a matrix A = (a,.) such that 
J 


Ge = : 
e a Se See 


det A # 0 . 

Definition 1. Let G bea group. Any homomorphism @:G- GL(V) is called 
a linear representation of G inthe vector space V. A representation is called faithful 
if the kernel of the representation Ker @ only consists of the identity element of G, and 
it is called the trivial representation (or the unit representation) if @(g) is the identity 
operator € forall geG. The dimension dim, V is called the dimension of the 
representation. When K=Q, R or @, we speak of a rational, real, or complex 
representation, respectively. 


Thus, a linear representation is a pair (&, V) consisting of a representation space 


V (also called a G-space) and ahomomorphism @:G— GL(V). By definition, we have 
@(e) = the identity operator €; 
(gh) = &(g)@h) forall g,heG 


If we agree to let g*v_ denote the action of the linear operator @(g) on the vector 


ve V, then we arrive at the relations 


Cee y= he a ies eS 
g*(hv) = A(g*v), iter 
e*vesv, (1) 


(gh)+v = g*(h*v) , 


which imitate the properties of linear operators (the last two are the same as the relations 
expressed just before using the ®; compare with (i) and (ii) in §2 Ch. 7). Relations 
(1) highlight the role of the G-space V _ inthe linear representation (@, V); this is 
often convenient, especially in situations where V_ is not just an abstract vector space, but 
has a concrete meaning. 


On the other hand, the vector space V_ need not be explicitly indicated; we can think 


392 


of a linear representation simply as a homomorphism @ from G to GL(n, kK). As 
before, we have an = %, Fh , but now ¢, is a non-singular matrix, and Be =E is 
the identity matrix. The matrix point of view is usually better for computational purposes, 
but it is less invariant and lacks the geometrical clarity of the vector space point of view. 
In practice, it is important to be able to go back and forth freely between the G-space and 
the matrix interpretations. 

In this connection, recall the basic fact from linear algebra that two matrices A 
and B which correspond to the same operator but in different bases, are similar: 
B=CA a (C is the transfer matrix from one basis to the other). In the situation of 
representation theory, when we are dealing with groups of linear operators, we take into 


account the dependence on the choice of basis in the following way. 


Definition 2. Two linear representations (4, V) and (,W) ofa group G are 
said to be equivalent (or isomorphic) if there exists an isomorphism of vector spaces 


o:V- W_ such that the diagram 


&(g) W(g) 


is commutative forall geG, i.e., 
@) cy ot (2) yee secs on, 
or, equivalently, 
=) 
W(g) = o8(g) 0 (2) 


(compare with the definition of equivalent actions of a group on sets, which was given in 
Exercise 1 of §2 Ch. 7). We shall sometimes write @~ W for equivalent representations, 
@ # W for inequivalent representations. 


Here are two variants of Definition 2. 


393 


(a) G-space terminology. Let G bea group, andlet V:(g,v) ® g*v_ and 
W:(g,w)  g Ow betwo G-spaces with * and satisfying condition (1). A 


vector space isomorphism o:V-~ W_ is called an isomorphism of G-spaces if 
BO ci) ole 4 y) (2') 


forall geG and veV. Inthat case we also say that the map oc commutes with the 


action of G. 


(b) Matrix terminology. If V = Yarra, W = CWireees WD, and 


e,, and ¥, are the matrices of the linear maps @(g) and W(g) inthese bases, then 


the condition (2) for equivalence can be written in the form 


= Carer 44 2" 
o, (2") 


where C_ is some non-singular matrix which is the same for all geG. The entries in 
all of these matrices are in the same field K. 

The relation of similarity of matrices, which is expressed by (2"), is an equivalence 
relation which divides the set M,(K) into disjoint equivalence classes. In the same way, 
the representations of a group G_ divide up into classes of equivalent representations. It 
will soon be clear that what is important and interesting is precisely the equivalence classes 
of representations. 

Again using linear algebra, we try to give a clearer picture of how a group G acts 
ona space V. If G@:V- V_ isa linear operator, there may exist an invariant subspace 


U, iie., for which ue U = GueU. If we take an arbitrary basis epee - e.} in 


U_ and extend it to a basis for allof V e.? > we see that 


COs eres Cer Cee yrcete 


the matrix of (@_ in the basis {e, 5088 % et has the following triangular block form: 


> 
tl 


The block A, corresponds to the invariant subspace U, and the block Ay corresponds 


394 


to the quotient space V/U. If Ay happens to be the zero matrix, then A = Ay + Ay 
is the direct sum of the blocks, and V =U @W_ is a direct sum of invariant subspaces. 
We can always find an eigen-vector, iie., ve V,v#0, forwhich Gv =i)v, 


X € K, ifwe suppose that K_ is algebraically closed (see §3 Ch. 6), for example, the 


field @ of complex numbers.‘ Here }) isa root of the characteristic polynomial 
=I] 
f(t) = te -Al = -@rayr + -*° + (-1) deta 


(A is the matrix of @ in any basis). Using eigen-vectors, we can easily choose a basis of 


V_ with respect to which A_ has the triangular form 


with roots y ' ho sono 4 i along the diagonal. A more careful analysis allows us to 
reduce A_ to the so-called Jordan normal form J(A) (see the Appendix), which is a 


direct sum of the Jordan cells 


[eee fama | oes apres 
O © @ gen 


(mxXm_ is the size of the cell, and } isa root of the characteristic polynomial). 


, then it follows that ie = Be. isthe mXm 


m,i 


and this is obviously only possible when m = 1 


Note that if we have A‘? = E 
identity matrix for each Jordan cell lee ? 
? 


and } isa q-th rootof 1 (let us suppose that K = @). Thus, 


? i am ? (3) 


395 


for a suitable invertible matrix C. Alternately, this can be shown using the fact that the 
characteristic polynomial ts (t) = ro - 1 has no multiple roots. 
These properties of a single linear operator G:V~— V_ should be born in mind when 


we study a group @(g), g ¢« G, of linear operators, 


Definition 3. Let (@, V) bea linear representation of a group G. A subspace 
UC V iscalled G-invariant (or G-stable) if @(g) ue U forall ue U andall 
g «¢€ G. The zero subspace and the entire space V_ are called the trivial invariant sub- 
spaces. A representation all of whose invariant subspaces are trivial is called irreducible, 
A representation is called reducible if it has at least one non-trivial invariant subspace, 

According to what was said above, if (@, V) is a reducible representation and U 


is an invariant subspace, then V_ has a basis relative to which 


e F 0 
so @ 
o, = 2 (4) 
O tr 
| ®, 


forall geG. Since en S e4, ; e, = EL and ,(U) < U, it follows that the map 
@:gb . gives a representation on U, which is called a subrepresentation of ©. 
In that case we also have a representation on V/U, which is called a quotient represen- 
tation; it is given by the matrices e pce. 

If it is possible to choose a basis of V_ in such a way that all of the matrices e 
in (4) are zero, then we say that @ is the direct sum of the representations @ and 
6’: ¢= 6 +6". A representation (@, V) has a direct sum decomposition if and only if 
it has an invariant subspace Uc V_ for which there is an invariant complement W, i.e., 
V=U6W, where 6(U)C U and (W)C W. Inthiscase @ is the restriction 


cca) of @ to U and 6" is the restrictionof @ to W. A linear representation 


U 
(@, V) is called indecomposable (and V_ is called an indecomposable G-space) if it 


cannot be written as a direct sum of two non-trivial subrepresentations, 


If we successively write V,U,W, etc. as direct sums of invariant subspaces 


396 


(when this is possible), we obtain a direct sum V = we 8) cog &) Ve of invariant subspaces 


(1) . = (Ge) 


(equivalently, we obtain a direct sum @=@ +--+ +@ of representations), Fora 


suitable choice of basis in V, the matrices of the linear operators are of the form 


a emer ts 


g 
(2) 
0 ; 0 
#, 
el e.. 
(r) 
0 (oh eee ) 
Le | 


Definition 4. A linear representation (@, V) ofa group G_ is said to be 
completely reducible if it is a direct sum of irreducible representations, In that case we 
alsocall V acompletely ‘reducible G-space. 

It is intuitively clear that the irreducible representations play the role of building 
blocks which are used to construct arbitrary linear representations. The completely 
reducible representations are obtained from them using the simplest construction -- the 
direct sum. We shall later see that in many cases this is sufficient for constructing all 
representations. We should remark that some groups which are important in physics, for 
example, the Lorenz group, have infinite dimensional irreducible representations. Of 
course, such representations cannot in any way be reduced to finite-dimensional ones, and 


they must be studied separately. 


2. Examples of linear representations. We have introduced all of the basic concepts 


of representation theory. In order to acquire a solid understanding of these ideas, it is very 
useful to start by becoming familiar with (and taking pains to understand in depth) the 


following examples. 


Example 1. By its definition, the general linear group GL(n,K) overa field K 
has a faithful irreducible n-dimensional linear representation with representation space 


n 
V=K . Any linear group HG GL(n,K) acts faithfully on this V, but the action may 


397 


be reducible. 

Similar remarks apply to the other classical groups in §1 Ch. 7. For example, the 
unitary group U(n) acts irreducibly on a Hermitian Space, and the orthogonal group 
O(n) acts on Euclidean space. This all follows immediately from the stronger assertion 
(proved in a basic linear algebra course) that the groups U(n) and O(n) act transitively 
(in the sense of Example 3 in Subsection 3 of §2 Ch. 7) on the set of vectors of unit 


length. 


Example 2. If we make GL(n,K) act on the vector space M_(K) of nxn 
matrices by the rule we :Xb AK (Ae GL(n,K), Xe M (4) ; Wwe easily see that 


v, (@X + BY) = ay, X + BY, Y and Yap = UR y, - Hence, (W, M,(K)) is an 


4 i ; 
n -dimensional linear representation. Let oi! )¢K) be the subspace of matrices 


D 
ca 
Oa: ea . O 
li 
Oe. ae ay. 0) 
ni 


i ; F Pe : 
with only one non-zero column x! ) . It is easy to check that this subspace is invariant 


under uA for A e€ GL(n,K), is irreducible, and is isomorphic (asa GL(n,K)-space) 


to the natural GL (n, K)-space Ko in Example 1. Thus, 


MY «Ky @... 6 Mm 


Me « y n nh 


(K) 


is a direct sum decomposition of M, (K) into n isomorphic GL(n,K)-subspaces; it 


corresponds to a direct sum decomposition 


eo eg 


into n equivalent representations. Symbolically, this can be written 
1 

yD 
n 


M(K) = aM(K); Wx nv 


Example 3. We now define an action @ ofthe group GL(n,K) on M, (K) by 


398 


= 2 : : 
setting ¢, :X B AXA : . Again (¢, M_ (4) is an n -dimensional linear represen- 


n 
tation. If X = (x) , then, as usual, we let tr X = ay Xi denote the trace of X. 
i 9 
i=l 


It is well known that tr(@X + BY) = atrX + BtrY (linearity of the trace function) and 
tr o, (X) = tr X. This implies that the set M. (K) of matrices with zero trace isa 
@-invariant subspace. On the other hand, ?, (AE) =AE and trjE=ni. Therefore, 
if K isa field of characteristic zero, we have a direct sum decomposition of 


GL (n, K)-subspaces 


0 
M,(K) = (BE) @ M, (K) (5) 


of dimension 1 and i - 1, respectively, Note thatif n =p and K = Zz , then 
there is no decomposition (5), since inthat case trE =0. 

According to the definition, the Jordan normal form J(X) ofa matrix X is 
nothing more nor less than a convenient and simple representative of the GL(n, €)-orbit 
containing X. Ifwe restrict @ toa subgroup HC GL(n,K), we have the natural 


question of finding similar forms for representatives of the H-orbits. 
Example 4. Set K = R_ in the previous example, and consider the restriction of 


@ tothe orthogonal group O(n). Since A ¢€ O(n) # oh = a » we have oN = EX, 
t eit Toei ll . 
GS Ai a CAxks 9) = A X A =€AXA_ . Hence, the representation space M,(R) 


for O(n) can be written as the following sum of O(n)-subspaces: 


+ a 
M,(R) = (E), © M, (IR) © M, (IR) 


i,e., the sum of the one-dimensional space (E) of scalar matrices, the 


IR 
(n+ 2) (n - 1)/2-dimensional space of symmetric matrices with zero trace, and the 
n(n-1)/2-dimensional space of skew-symmetric matrices. There is a well-known one-to- 


one correspondence between the symmetric matrices (resp. skew-symmetric matrices) and 


the symmetric (resp. skew-symmetric) bilinear forms. The action of O(n) on 


399 


CE aR ® M" (IR) and on M, (R) carries over to the spaces of the corresponding forms. 
The theorem on reducing a quadratic form q(x) to diagonal form is equivalent to say ing 
that in the orbit containing q(x) one can choose a diagonal form > i x with real hi 
which are uniquely determined up to permuting their order. 

If we replace IR by @ and replace O(n) by the unitary group U(n), we 


obtain the decomposition 


+ = 
M, (€) = Ee ® M, (C) ® M, () 


into the direct sum of the U(n)-subspaces of scalar matrices, hermitian matrices with 
zero trace, and skew-hermitian matrices. The case n= 2 was discussed in detail in 


dk (Gln 73 
Example 5. Let G_ be a group of permutations acting ona set of cardinality 
l= ies —Ge Ss , Wet 


We (elven). 


be the vector space over a field K_ of characteristic zero with basis indexed by the elements 
of the set . Wemake V intoa G-space by setting 


#(g) > : ‘] = DF 4) 4 = My Sou) 


ie Q ie Q ie Q 


(i g(i) is the action of a permutation ge« G on ie). Since (gh) (i) = g(h(i)) : 
we obtain an n-dimensional linear representation of G. It is never irreducible, since 
= =O0,A4,.€ K 6 
: (Oy, a 8 DAse | Ds Deo ies be) 
ie Q 
decomposes into the direct sum of a one-dimensional invariant subspace and an 
(n-1)-dimensional subspace. (If char K =p > 0 and pjn , then we no longer obtain 


this direct sum. ) 


We consider two special cases, 


400 


(a) G= Sy . The monomorphism Sy ~ GL(n, R) in Subsection 5 of §3 Ch. 4 
coincides with our linear representation @ if wetakethe i-th coordinate column 
pi) for ei. The decomposition (6) shows that we have a more economical imbedding 
oe ~ GL(n-1,@). We shall later show that this (n-1)-dimensional linear representation 


is irreducible (even over the field {). 


(b) ‘The regular representation. Let G_ be any finite group. If we set Q=G, 


we obtain the so-called regular G-space V = Ge le € G) and the corresponding 
regular representation (op, V) ofthe group G: p(a) oe = ae forall a,geG. We 
already encountered the regular representation in somewhat different notation in the proof 
of Cayley's theorem (§3 Ch. 4), but at that time we were just interested in the set ter 
and not inthe space V. ‘The regular representation of a finite group G _ is important 
because it contains all of the irreducible representations of G (up to equivalence), as we 


shall see in 85. 


Example 6. <A one-dimensional representation is simply a homomorphism 
@:G-~ K* fromthe group G_ tothe multiplicative group of the field K (K isa one- 
dimensional vector space over itself, and GL(1, K) = K*). Since the multiplicative group 
of a field is abelian, it follows that Ker@=2>G', where G' isthe commutant of G 
(Theorem 4 of 83 Ch. 7). Note that equivalence of two one-dimensional representations 
#' and @" (having the same representation space) is the same as equality, since 
ag'(g)a = @"(g) = $'(g) = @"(g) = &' = &". Suppose that g" =e, Then &(g)" = 
= alg) =@(e)=1, ice., O(g) isa root of unity. If K = @, we shall see that every 
cyclic group has a faithful one-dimensional representation. But in general it may happen 
that even a homomorphism from a cyclic group to K* always has a non-trivial kernel; 


for example, let G = hy, and K = Z ,» in which case always Ker @ > 2k, . 


(a) G = (Z,+), K=€. The representation kb . is faithful if la | ra ibe Ite 


27i8 9 27i6k 


In| = 1, thenby Euler's formula i =e , e IR, andthe map kbe has 


a non-trivial kernel if and only if 0« Q. 


401 


To find complex representations of the group Z of arbitrarily high dimension 
which are indecomposable (but not irreducible), we can use the Jordan normal form of a 


matrix, and consider the map 


0 o of* 
i as ORO 
k 
k > = 
cae ed 5 : 0 
077056 ith 
10 0 0 1]] 
(b) Ge= (ala =e), K=cC. Let eee be a primitive n-th root of 


one. Out of the n one-dimensional representations 


ree as _— ene it = OW tesco gilt il | (7) 


exactly @(n) of them are faithful. We note the following interesting fact: a cyclic group 
of order n has exactly n non-equivalent irreducible representations over C@. They 
are all one-dimensional, and they have the form (7). Indeed, it suffices to show that a 
finite cyclic group has no irreducible complex representations of dimension > 1. But 
before giving Definition 3 we noted that any linear operator &(g) of finite order is 
diagonalizable over @. Inthe present situation, this gives us complete reducibility of ¢@. 
If dim@-=r, then @ decomposes into a direct sum of r one-dimensional 
representations. 

We have thus obtained a description of all complex linear representations of a cyclic 


group of finite order. Up to equivalence we have 


(i,) 
1 
0 
hs 
, = a ’ 
i 
r | 
0 & 


402 


where ee) is one of the representations of the form (7). 


We would like to establish similar rules for more general cases, 


Example 7. We have already noted in the above examples how the properties of a 
linear representation @ of a group G can depend strongly on the ground field K. We 
now Clarify this question somewhat. 

If we let the cyclic group G = ¢a laP =e) of prime order p act on the two- 


dimensional vector space V = a ; v,? overafield K of characteristic p according 


tothe rule a * Ve yee 5 i +v,, we obtain an indecomposable representation 
(6, V) 
ie te 1 sk 
a [fo @& = j O <— Ikke =< me il 
i oy ‘maa 


’ 


In fact, the matrix ee has characteristic root 1 with multiplicity 2. Hence, if © 
decomposed into a direct sum of two one-dimensional representations, there would exist an 


invertible matrix C for which Geo = R °| =E. Butthn 6 =C EC-=E, 
a 


which is false. 
3 
Now let G = (a la =e) beacyclic group of order 3, andlet K = MR. The 
two-dimensional representation (@, V), V = or ; Vv, which is defined in this basis by 


the matrix 


is irreducible, since the characteristic polynomial t +t+ 1 of this matrix does not 
have real roots. But if we consider V over @, then, of course, V decomposes into 


a sum of one-dimensional G-subspaces 


-1 
WV = oT +e a) ® OM + ev.) 


and we have 


403 


€ 0 a 
“1 ge ht as eo 
2 3 


Thus, we may lose irreducibility of a representation when we extend the field, 
In what follows, with rare exceptions, we shall take the ground field K_ to be the 
field of complex numbers (which is the most important case from a practical point of view), 


or else an arbitrary algebraically closed field of characteristic zero. 
EXERCISES 


1, The group SO(2) is defined by its natural two-dimensional representation 


ie 


cos@ -sin@ 
sin 6 cos @ 


@' (8) = 
¢ 
which is irreducible over IR. Verify that 


1 


As'(®)A = for A = 


1 Hl i 
| i FE } ill € GL(2, 0) 6 


Hence, @' is a direct sum of two non-equivalent (which in this situation simply means 
distinct) one-dimensional representations. 
0 
2. Isthe GL(n, C)-space M, (€) in the decomposition (5) irreducible when 
n= 2 and 3? (Answer: yes.) 


3. Let @ and wW be irreducible complex representations of a cyclic group 


(a la” =e) oforder n. Show that 


I if 62+ WV, 


Ble 


fob = il 
Y ee ves = 
k=0 Oo, if @#Y 


4, Use Exercise 3 to prove the following assertion. Any complex-valued function 


f ona finite cyclic group (a re =e) can be expanded "in simple harmonics" as follows: 


404 


c 2ni 

Tie a 

k > mk n 

f(a ) = Ca ’ ¢ = ¢€ 
m=0 


The "Fourier coefficients” care computed according to the formula 


n-1! 
5 -:> soma 
m =a) 


5. Use the formula for the number of necklaces (see the beginning of the chapter) to 


prove: (a) q? -q = 0 (mod p) (Fermat's Little Theorem; see §4 Ch. 4); 
(b) p(d) =n. 
djn 


§2. Unitary and reducible representations 


1, Unitary representations. Recall from linear algebra that a non-degenerate form 


(u,v) (uly) on avector space V over @_ is called hermitian if 


(u lv) 


(viu) , 


(au + Bv |w) a(u|w) + B(v |w) ; (1) 


(v |v) = 0) weyeei) «we =? 


(as always, ztZ denotes complex conjugation). The space V, considered together 
with a non-degenerate hermitian form (u lv) , is called a hermitian space. The analog 
over JR is euclidean space with a scalar product given by a non-degenerate symmetric 


bilinear form. If we take a basis e 


oo ee for V, we can write the form (uv) 


one [i = De, ah ej ands Vvae= eo) vi e. as follows: 


(uly) = YS LF ue v3 . 
The matrix H = (by) satisfies the condition Ri = a . Such a matrix is also called 


hermitian. We have already used this terminology in §1 Ch. 7. 


405 


‘There exists an orthonormal basis (i.e. » such that (e, le) = §..) relative to 
i 


j 
which 


n 
(uly) = De uv, A 
i=l 


A linear operator @:V-— V_ which preserves this form, i.e., such that (aulav) = (uly), 
is called a unitary operator. The analogy over R_ isthe orthogonal operators. In 
Chapter 7 we already encountered the unitary condition, written in matrix form, i.e., 
Ae ay = 5 Who ASG), As Av= (,,) - Ifwelet Q* denote the linear operator 
with matrix a = A*, then we can express the unitary condition inthe form G+ Q* = @ = 
= @ oa. 

It is customary to let U(n) denote the group of all unitary matrices (also called the 


group of unitary operators, or simply the unitary group). By definition, U(n)¢ GL(n, @). 


Ifa representation @:G-~ GL(n,€) has the property that Im@cC U(n), then (6, V) 
is called a unitary representation. 


THEOREM. If G_ isa finite group, then every linear representation (6, V) of 
G over C€ is equivalent to a unitary representation. 


Proof. Inthe representation space V_ choose any non-degenerate hermitian form 


H: (uyv) & H(u,v) = = ne uy, (in terms of a basis fs e605 i for V). Consider 


the form (uv) obtained from H(u,v) by "averaging over G": 
= 
uly) = |G[~ S> H@(e)u, ov). (2) 
geG 


The factor Ic | : is not essential, and is only inserted so that, if H_ is already unitary, 


we get (ulv) = H(u,v). Since 


406 


H(@(g)u, @(g)v) = H(@(g)v, S(g)u) , 


H(@(g) (@u + Bv), &(g)w) = 


H(@(g)u + BPS(g)v, (g)w) = 


aH(S(g)u, &(g)w) + BH(S(g)v, S(g)w) , 


H(@(g)v, (g)v) > 0 


for v #0 andall geG, it follows that the form (2) satisfies the conditions in (1), 


and so is a non-degenerate hermitian form, In addition (and this is what is most important), 


(@(g)u|@(g)v) = 


= |ol* > H@m@se)u, oH) H(2)v) = 
heG 


= |ol* > H@teu, aina)y) = 
heG 


= |oJ* YD He wu, dv = (uly) , 
teG 


i.e., for any g eG the operator @(g) leaves the form (u|v) invariant. Choose a 


basis e eo in V_ which is orthonormal relative to the form (u lv). Then the 


pert? 


matrices oF of the operators @(g) will be unitary in this basis. a 


Remarks. (1) Theorem 1 does not follow automatically from the (much weaker) 
fact that we knew before which says that each individual matrix ts with gi =e is 


similar to a unitary matrix diag ths oad g a with = ls 


(2) Inthe real case, a completely analogous argument shows that every linear 


representation of a finite group is equivalent to an orthogonal representation. 


(3) For a variety of reasons, unitary representations play an important role in 
applications of representation theory. Remarkably, Theorem 1 remains true for a much 
broader class of groups, for example, for G = U(n) or O(n). The proof is the same, 


except that the summation over the elements of G_ is replaced by integration (suitably 


407 


defined) over the compact group G. Recall that the compact group SU(2) is geomet- 
rically indistinguishable from the three-dimensional sphere 3° » and so it makes sense 
to speak, for example, about its volume. In general, there is a remarkable parallel in 
representation theory between finite and compact groups, but we cannot dwell on this here, 
It is clear from Example 6a of §1 that representations of non-compact groups (such as 


G = Z) need not be unitary. 


In conclusion, we note that, while the proof of Theorem 1 is constructive, it would 
not be very practical to use it to find a unitary realization of a given representation. For 
example, if G is generated by elements Ayyerey aay then it would be sufficient to 


find a representation for which the matrices ee yond 7 ©, are unitary, since in that 
a d 


cases simiGe= CS Vases @ ) Se Ui(n)). 
a, a ; 


Example 1. The symmetric group S, = ¢(12),(123)) has a two-dimensional 


3 


representation which is a direct summand in the natural three-dimensional represen- 


tation (see Example 5 of §1). Namely, if & (me, See i= 1,2,3, and 
f, = €) 7 eg, f, ee ae then 
#((12))f, = e, -e, = f,, © (02) ees 
@ ((123)) f, pace ee fos  ((123)) f, 2 Ose 


Since # = (123)" (12)! » where i= 0,1, or 2 and j=0 or 1, we easily obtain 


all of the matrices 


; le (13) ee 


Hf 


a Ss. 


i 0) =I 
3 ec Oe) 0 1 


ep 


-1 -1 
(23) > ; al. (132) -—= 


Elie I 


1 


Since det I E |= 1 and (223)? =e, it follows that 


408 


@ il 
for some non-singular matrix C. If we conjugate I 0 I by C, we do not lose the 


unitary property of this matrix. Solving the linear equations 


Lo ae Leo O ¢ y 6 


for the entries of C , we obtain: 


We can now write out a table of all of the unitary representations of Se. which we 


i 2) 
know: the trivial representation a! ) the representation a! ) 


:7 sont ec {al}, 


and the two-dimensional representation 3) which we just found. The following table is 


convenient for future reference: 


(123) (132) 


Example 2. The epimorphism  : SU(2) ~ SO(3) that was constructed in §1 


Ch. 7 can be thought. of as a natural orthogonal representation of the infinite group SU(2). 


2. Complete reducibility. The following fact is fundamental, as should be clear 


from the remarks and definitions in §1. 


409 


THEOREM 2, Every linear representation of a finite group G overa field K 
of characteristic zero or of characteristic not dividing lc| , is completely reducible. 

Recall that this means that the representation (@, V) canbe decomposed into a 
direct sum of irreducible representations. Actually, the classical version of Theorem 2 


is as follows: 
Every G-invariant subspace UC V hasa G-invariant complement W: 
V=U6OEW é (3) 


It is this assertion which we shall prove. Theorem 2 will then follow immediately, 
since either (@, V) is irreducible, in which case there is nothing to prove, or else there 
exists a proper G-invariant subspace U, in which case (3) holds for some G-subspace 
W. Then dimU < dim V and dim W < dim V. Applying the same argument to U 
and W, and using induction on the dimension of the representation, we obtain the required 
decomposition into irreducible components. 

Thus, it suffices to prove that every G-invariant U hasa G-invariant 
complement. As usual, we are most interested in the case K = €, So it is useful to give 
two independent proofs. 

First proof (K = €). By Theorem 1, there exists a non-degenerate hermitian 
form (u lv) on the representation space V_ which is invariant with respect to the linear 


operators @(g). For every subspace UC V_ there exists an orthogonal complement 
L 
U = {ve vi[tlv) = 0, vu e uf 5 
and, as is well known from linear algebra, we have ra 


Ve Ue US 4, 


L 
and also (U*) = U. Now suppose that U isa G-subspace of V, i.e., that 


@(g)UC U forall geG. Since &(g) U is an automorphism, it follows that any 


element u¢ U canbe written inthe form u = @(g)u', wu’ « U. We now use the 


410 


invariance of the form (ulv) 
ve UY => (ul(g)v) = (@(g)u' |@(g)v) = (u'ly) = 0 


Thus, ve Up @(g)v € us. Setting W = es , we obtain (3). Oo 


Second proof. As before, let U bea subspace of V_ which is invariant under 


the action of G. Consider the direct sum 
Woe 1 Gy lu 


where U' is any complement of U. Ingeneral, U' isnot G-invariant. Consider 


the projection ®:V- U', which is definedby fv = u' for every vector v = u+t+u'. 


We have 
v-PveU, P(U) = 0, P Sip. (4) 
We now introduce the linear “averaging” operator 


-1 -1 
We, = Icl” Do omrem’) 
heG 


(by our assumption regarding char K, we are allowed to divide by ie Ne We have 
@(g)P, = Pg), WecG . (5) 


To establish (5), we verify that 


3(e)%, @(g) = Iol 2, SBM) PHO B(E")) = 


= lol DY aneodeny') = Il? ewes) =e, , 
heG tren Gs 


as required. We set 


W = ®,(V) = {Pov lv e Vv} 


According to (5), we have @(g)w = @(g)Pyv = Foo (g)v = ae = w' € W_ for every 


we W, so that the vector subspace Wc V is actually a G-subspace. 


411 


lt remains to show that V=U@Q@W. Since nap) vV- PO eae e U (see (4)), 
é =il = a 
it follows that v -@(h)PO(h)v = O(n) {@(h_)v- Pah )v} c @(h)U = U_ (by the 


invariance of U). Consequently, 


vs lel ye (v-@th) Pa y= ue U ; 
heG 


and we obtain v = u+w with MES Ne Ga, WwW = Wap Wy. 


-l = - 
Next, we have @(h )UCU = Pa&(h hy = 0 (by (4)) = @(h) P@&(h iG 
r. a 2 
=0 = PGW) = 0. Hence, ow ay = te > RoW oie 0, so that Pav = Pav 


for all ve V. This means that RG is projection along U onto W: 


2 

FU) = 0, RG = Pa : (6) 

Now we Ul) W> hav = 0, sigce veéeU, and v= Pv’, since ve W=P,(¥). 
s : t 2 42 A 

Using (6), we obtain O0=P yee) a iany =v aN =v> UN We=0O0. 0 


iG 
We would not be justified in making the stronger assertion that the decomposition into 
irreducible components (irreducible G-spaces) is unique. For example, if @(g)=€ is 
the identity for all ge G, then any direct sum decomposition of V into one-dimensional 
subspaces is a decomposition into irreducible G-spaces , and there are infinitely many 


such decompositions. But suppose that we group together all of the isomorphic irreducible 


components, and write 


Since we do not distinguish between isomorphic G-spaces , we may suppose that 


mI 


U,V tee ee ae 


iS 
Hl 


Vey @ ... ey = hav, 
Ss Ss Ss Sans 


where n, is the multiplicity with which the irreducible component Ve occurs in V. We 
i 


shall see that these multiplicities are uniquely determined. 


412 


EXERCISES 


1. Every one-dimensional continuous representation of the group (R,+) (i.e, 


a iat 
( De 


such that nearby numbers correspond to nearby operators) has the form 6 ; 


(a) 


where @ is acomplex number. Showthat is unitary if and onlyif @e R. 


cost -sint 
sint cost 


2. The kemel of the homomorphism f:t ® from the group 


(R,+) to SO(2) consists of the numbers t = 27m, me Z. Thus, 

SO(2) = R/27Z , and to every irreducible unitary representation & of the group SO(2) 
(by the results in §4, such a representation must be one-dimensional) there corresponds an 
irreducible unitary representation & :t+ 27m > @(t), 0O<t < 27, of the group 
(R,+), for which & (2m) = (0) = 1. Use Exercise 1 to show that ry = gi” for 


some ne Z. Together with Remark 3) in Subsection 1, this means that every 


irreducible representation of SO(2) has the form gm) (Go) = eu » ne Z. Verify that 
i 20 ikt i4t At 5 
2m Jo ~ “ei 


(compare with the relation in Exercise 3 of §1: the order n has been replaced by the 
"volume" 2 of the group SO(2)). In analysis, the set of functions fet is the 
classic example of a complete orthonormal system of periodic functions (i.e. , functions on 


: 1 Aas : 
the circle S ~ SO(2)). This is the point of departure for the theory of Fourier series. 


3. Use Theorem 2 to prove that any faithful two-dimensional complex represen- 


tation of a finite non-abelian group is irreducible. 


§3. Finite rotation groups 


In this section we shall be concemed with finite subgroups of the group SO(3). In 


the process of determining these groups, we shall obtain the irreducible orthogonal 


413 


representations of such groups as Ay Sy As in an easily remembered geometrical 
setting. Subsection 1 and the proof of Theorem 2 can be omitted in a first reading, but 


the reader who really wants to be sure of having a firm grasp of the idea of "group actions" 


(§2 Ch. 7) would be well-advised to become familiar with the entire section. 


1. ‘The orders of finite subgroups of SO(3). According to Euler's theorem of 
linear algebra, every element @ « SO(3), G # @, isa rotation in R° about some 
axis. In other words, there are precisely two points on the two-dimensional unit sphere 
3? which are left fixed by @, namely, the points of intersection of the sphere and the 
axis, These two points are called the poles of the rotation C. 

Now let G_ bea finite subgroup of SO(3), andlet S_ be the set of poles of all of 
the rotations (besides the identity) in G. Itis clear that G acts like a permutation 
group on the set S. If x isapole for some rotation G@4#@, GeG, then for any 


Rf ¢€ G we have 
=) 
@GR Rx = R°CGx = Rx , 


i.e., x isa pole for ace: » andso @#x eS. Welet Q > denote the set of all 
ordered pairs (@,x), where GeG, G@#€@, and x isapolefor q. Further, let 
G_ be the stationary subgroup (stabilizer) of the point x, i.e., the subgroup of all 

a ] 


elements of G whichleave x fixed. If 


Gs G, U &, G, WP oeq UW Bn Ox 


is the partition of G into left cosets of G. , then the G-orbit of x is the set 


G(x) = {x, BX, oaG 4 Bn x} 
x 


containing IG (x) | m, elements. By Lagrange’s theorem, N = mn, where 


U 


iN Ic | and n ie | (we are using somewhat different notation from that in §1 
x x 


Ch. 7). Note that n_ is the order of a cyclic subgroup of G, each of whose elements 
x 


414 


is a rotation about the axis through x. We say that ny is the multiplicity of the pole x, 
orthat x isan n, “pole 

Every element @ #€ in G_ has two poles; hence In| = ZN ol). 

On the other hand, for each pole x _ there are n,- 1 elements (besides the 


identity) in G whichleave x fixed. Consequently, the number of pairs (G,x) is equal 


to the sum 
l= De 
xes 
If we let 1 5600 ¥ x5 be a set of poles taken one from each orbit, set a, = 1 
i 


and m,=m_, andnotethat n =n =n, forall x € G(x,), we obtain 
i x x xe i i 


i i 
k k 
CS ae Non 
ae i=l i=l ; 
Thus, 
k 
UNC = SS sini) 


Dividing through by N, we obtain 


a PH e-s|. (1) 


Zl 


oS Il i 


We suppose that N > 1, sothat 1 < 2 -o< 2. Since n, > 2, we have 
= iS 


1 
Ti2e = is 1, andso k must equal 2 or 3, 
i 


Casco wes — 2 ihe 


or, equivalently, 


N N 
ae fe ; 
oi a2 
so that m= th 1 and By alee N. Hence, G_ has precisely one axis of 
rotation, and G = Cy is a cyclic group of order N. 
Case 2. k=3. Tobe definite, suppose that n < ny = ny . If we had n ES) 
then we would have 
3 3 
i 1 
weeds ef -dae, 
i=1 i i=l 
which is impossible. Thus, ny = 2, and equation (1) can be written in the form 
Le eee ee 
2 N ny n. 
} 1 il 1 ee 
Obviously, n, > 4 => —+-— < =, acontradiction. Hence, n, = 2 or 3. 
2 n i, = 2 2 
2 3 
If n,=2, then n, = N/2=m (N must be even), and m,=m,=m, 


m, = 2. These data correspond to the dihedral group Da (see Example 1 in Subsection 


ojo Che 7). 
If no=3, then 
pede d, 
3 

and we only have three possibilities: 
zy) n= 3, N = 12, m, = 6, m, = 4, m, = 4; 
2") n, = 4, N = 24, m, = Ll, m, = 8, m, = 6; 
ae) m= 5, NM) = Gil). m, = 30, m, = 20, m, = 12 6 


We collect all of this information in the following table: 


416 


Number 
of 
orbits 


orders of the 
stabilizers 


(2) 


We have proved the following fact. 


THEOREM 1. Let G bea finite subgroup of SO(3) which is not cyclic or 
dihedral. Then there are only three possibilities for N = le | : N = 12, 24, 60. Other 
conditions satisfied by G are given inthe table (2). (ea 


2. Symmetry groups for regular polyhedra. It is not hard to prove the existence of 


groups oforder 12, 24, 60 (which are not cyclic or dihedral) which are contained in 
50(3). There are only five regular convex polyhedra in R° (up to similarity). They have 


been known since antiquity; they are: the tetrahedron 4, » the cube 9 the octahedron 


6 bd 


Ag , the dodecahedron eh, , and the icosahedron 459 : 


417 


If the center of a regular polyhedron M_ is placed at the origin in R° , then the rotations 
in SO(3) whichtake M _ to itself form a finite subgroup. But instead of five, we only 
obtain three different (i.e. , non-isomorphic) groups, since the cube and octahedron, and 
also the dodecahedron and icosahedron, lead to the same group. ‘This is very easy to see 
geometrically. If we join the centers of adjacent faces of the cube with line segments, then 
these line segments are the edges of an octahedron inscribed in the cube. Every rotation in 
R° which takes the cube to itself also takes the inscribed octahedron to itself, and 
conversely. A similar observation applies to the dodecahedron and icosahedron. In the 
table below, N, is the number of vertices of the polyhedron, N, is the number of edges, 


0 1 


N, is the number of faces, yp is the number of sides (edges) in each face, and vy is 


the number of faces which meet at a vertex. As before, N_ is the order of the corre- 


sponding group. 


Octahedron 


Dodecahedron ; 


Icosahedron ... 


According to Euler's theorem on polyhedra, we have No = N, + N, = 2. The total 
number of poles is equal to No + Ny ar N, = 2N) + 2. Under any rotation which takes 
the polyhedron to itself, a given edge ay by can go to any other edge a,b. or bia, : 
thus, N= 2N,. We also note that Lie = {n,, na | , Where n, and ng are the 
multiplicities of the poles, which we introduced in Subsection 1. 

Further, let T be the group of the tetrahedron, O be the group of the cube (or 
octahedron), and I be the group of the icosahedron (dodecahedron). 

The elements of T are the rotations through multiples of #/2 around the four 


axes connecting the vertices with the centers of the opposite faces, the rotations through f 


418 


around each of the three axes connecting the midpoints of opposite edges, and the identity 
rotation. 

Besides the identity, the group O consists of the rotations through 1/2, and 
3#/2 around the three axes connecting the centers of opposite faces of the cube, the 
rotations through 2/3 and 49/3 around the four axes connecting diametrically oppo- 
site vertices, and the rotations through # around each of the six axes connecting the mid- 
points of diametrically opposite edges. 

The regular tetrahedron can be inscribed in the cube, and then it remains invariant 
under some of the rotations of order 3 and 2 in ©. There are 12 such rotations 
(including the identity), and they make up all of the group T. Consequently, TCO, 
and, since lo:T| = 2, itfollowsthat T 4 0. 

To each element of O there corresponds exactly one permutation of the set 


consisting of the four principal diagonals of the cube. Since lo| = Is, | = 24, it follows 


Similarly, T= Ay : 


In Exercise 2 below, we see that I= A 


Returning to the proof of Theorem 1, we note that when n= 2 and iy > 0 3 


there are two four-element orbits of poles G(p,) = {P, Pos Pay p4t and G(q,) = 


= ; : 2 Om 
= 14) +4519,,9,}, where P; and q are opposite points on S'. If 4, is the 


4 ; : 0 
tetrahedron with vertices Pi» then its symmetry group T contains G. Since 


: 0 
Ic| = 12, it follows that A, is a regular tetrahedron, i,e., a = a, » and 
0 
T=G=+=T 
When n, = 3 and Ng = 4, we take the six-element orbit of poles 
G(p,) = {P,5 ane Pt - These poles divide up into pairs, since i 4 3 = a7 ae Ee 


; ; 2 
take these three pairs of points on S° as the three pairs of opposite vertices of an 
0 ‘ , 0 
octahedron de - As in the previous case, since Ic| = 24, we have Ag = Ag Gh @e » 


0) 
Ag is a regular octahedron), and oe =CG=s 0, 


419 


Finally, when Ne 2, n, =3 and n, = 5, we construct an icosahedron 


2 3 , 
boo whose vertices p, are taken from the orbit G(p,) = {D5 e506 eet . Again, 
since |G| = 60, it follows that boo is regular, and 1) =G= 1. 
It remains to note that any two regular polyhedra of the same type which are inscribed 
in the sphere Ss? can be obtained from one another by a rotation (or by a change of 


coordinates). This shows that the isomorphic finite subgroups of SO(3) are conjugate to 


one another. We gather together our results in the form of a theorem. 


THEOREM 2. All of the finite subgroups of SO(3) are up to isomorphism one of 


the groups Cy Di pl Gan 3 THA, O=5S,, or I= A, Any two isomorphic 


finite subgroups are conjugate in SO(3). (ial 


COROLLARY. The isomorphisms in Theorem 2 give irreducible three-dimensional 
orthogonal representations of the groups Ags Sy » and A. 6 O 


Using Theorem 2 andthe epimorphism ®: SU(2) - SO(3) (Theorem 1 of 81 
Ch. 7), we easily obtain a description of all of the finite subgroups of SU(2) (one can also 
go the other way, first finding the finite subgroups of SU(2), andthenof S0O(3)). Any 
such group G* which is not cyclic is the preimage of a finite subgroup GC SO(3). This 
gives the so-called binary groups: 


Ps), T= G ee = Oe ee ng 


-- the binary dihedral group, the binary tetrahedral group, the binary octahedral group, 
and the binary icosahedral group. Like the orthogonal representation @ :SU(2) 7 SO(3) 
itself, the binary groups arise in a natural way when one describes the states of a physical 


system of particles with spin. 


420 


EXERCISES 


1. Besides the trivial subgroup, the icosahedral group I contains 15 conjugate 
cyclic subgroups of order 2, 10 conjugate cyclic subgroups of order 3, and 6 con- 


jugate cyclic subgroups of order 5. Prove that I isa simple group. 


2. Construct an isomorphism between the groups I and Ag : 


3. Show that, if H is a finite subgroup of odd order in SU(2) or SO(3), 


then H_ is cyclic. 


4. Show that, if a finite subgroup HC SU(2) is not the preimage of any subgroup 


Gc SO(3), then |H| = 1 (mod 2). 


5S. Show that, up to conjugation 


6. What do the following two groups have in common: the binary icosahedral group 


I* andthe group 


a b 
CS a 


SL(2,Z.) = { ad-be = 1; a,b,c,de Z.} ? 


7. Suppose that atoms of q_ different sorts (q < 200) can be placed in any 
possible way (we are neglecting chemical bonds) at the vertices of a regular polyhedron M. 
We do not distinguish between the “molecules" which can be obtained from one another by a 
rotation around some axis, Let f(M,q) be the number of different "molecules". Derive 


the formulas: 


421 


-4 2 
ar 6 2 
EU ee MG) ae Me ae 


2 
f(A, , 4) = S Ch + eae + 12q + 8) . 


8. Show that, if we compute the number of ways of coloring the faces of M_ with 
q sorts of colors, in the case of the tetrahedron Ay we obtain the same formula as in 


Exercise 7, and in the case of the cube and octahedron the formulas are interchanged. 
$4. Characters of linear representations 


1, Schur's lemma and corollary. At the base of every fundamental mathematical 
theory one usually finds several relatively simple (but subtle) ideas. One of the corner- 
stones of representation theory is the following fact. 

THEOREM 1 (Schur's lemma). Suppose that (@,V) and (¥,W) are two 
irreducible complex representations of a group G, and suppose that 0: V —- W isa 
linear map such that 

W(gje— c(s), YeeG . (1) 


Then: 


(i) ifthe representations @ and W are not equivalent, it follows that o = 0; 


(ii) if V =W and @=¥W, then o = \€ for some scalar 4. 


Proof. If o = 0, there is nothing to prove. So suppose that o # 0, and set 


Vo = KerocC V. 


Since 06 (g)v, = V(g)ov, = 0 for any v) ¢€ Vj), it follows that & (g) Vo = Yoo 


422 


i.e. , the subspace Vo is G-invariant. Since (#, V) is irreducible, we have Vo = 0 


or V. But we cannot have Vo = V, because o #0. Hence, Kero = 0. 

Similarly, if we set Ww, =ImocwWw, wehave wy) € WwW, => V(g)w, = V(g)o(v = 
=ao@ (g)v,)) = wi € Wy » so that WwW, is an invariant subspace of W. Again 
a#0 = Ww, #0, and, since (¥,W) is an irreducible representation, the only 


possibility is that WwW, =W. 


(i) Since Kera = 0 and Imo =W, it follows that o:V- W_ isan iso- 
morphism, and condition (1) is neither more nor less than the definition of equivalence of 


two representations @ and W (see Definition 2 in §1). This proves assertion (i). 


(ii) By assumption, o:V-7 V isa linear operatoron V. Since € is 
algebraically closed, it has aa eigen-value; let X be aneigen-value of o. The linear 
Operator oO) = 0 - \@ has a non-trivial kernel (since it contains an eigen-vector for \), 
and it satisfies the equality W(g) Op = O% @(g). By what was proved before, this means 


that = 0, Le., G= hE. 0 


COROLLARY. Let (@,V) and (¥, W) be two irreducible complex represen- 
tations of a finite group G_ of order lo | , and let o: V-~W_ be any linear map. Then 
the "averaging" map 


= = is > veces)” 


geG 
has the following properties: 
(i) 6# = oc = 0 : 
a To Geta GS 3 = ee 
dim V 


Proof. We have 


423 


w(e)oe(e) = [ol > we@vmocem ae - 
heG 


= lol" 3 wien c@tem! = [ol ~ vocaw =F , 
B teG 


so that U(g)o = er (g), ¥g eG. Schur's lemma immediately gives us both assertions, 


and the precise formula for follows from the relations 


(dim V)\ = tri@é = tro = lal? >, tr &(g)ea(e) | = lela Se (Gee) = ine , 
geG geG 


Here we have used the well-known property of the trace function: tr CA Gr =trA. fl 


We shall need the matrix version of this corollary. To formulate this, we choose 
any basesin V and W: Ve= Ce, [i e 1) 3 Wee oe lj € J). We write our maps in 


these bases, and identify the maps with the corresponding matrices: 


$= ©), Y= W.@) , 
ca (05), a Se i eal eal eed) 
By the definition of o » we have 
~ -1 -1 
7, Icl Dy by (8) Oy M(B) (2) 


geG,ivel,j'eJ 
Our map @:V- W _ is completely arbitrary. We can take, for example, 


Say = ¥G,i # (igs ig) ; oer = ih ° (3) 


Part (i) of the corollary then corresponds to the relation 


ail all NER Ton 
Ic | a bi MiB 2 = Or Vr tyr dodo (4) 


(@ and W are inequivalent representations). 


Nowif V=W and @=W, then 


AE maliiee i 
Py J J 
~ tro ~ (De fa} oF 
= = = = tet Oaret i) 
Bir aaa te “ii ji dim Vs dim V »s Cop ay 


Comparing this expression with (2), we obtain 


e -1 1 
IcJ" YS owe ne.e == Ys, 
geG,1',}' jj yor eat dim V i a 


, 


Soy eae o) 


from which, because of the arbitrariness in the choice of o (see (3)), we conclude that 


part (ii) of the corollary corresponds to the relation 


6. 
5 : & dima 7 0s One 
isl" OS &, Ge, .@ >= (5) 
Se ty iy" 
g 0 otherwise 


Relations (4) and (5) contain the information we shall need. 


2, Characters of representations. To each complex finite -dimensional linear 


representation (@, V) ofagroup G_ we associate the function 


ee ; 


defined by setting 


Xg (8) = tr &(g), ee @ ¢ 


this function is called the character of the representation. It can also be denoted Xy or 


simply  X if it is clear what representation is being discussed. 


Let as = Gi, (g)) be the matrix corresponding to the operator (g) in some 


basis of the space V, and let 1 anee 4 us (n = dim V) be the characteristic roots of 


this matrix, counted with multiplicity. (The di , of course, dependon g.) By 


definition, we have 


425 


Xg(8) = %, (8) = » ©, (8) = Ds d 


If C is any invertible matrix, then 


-l1 
Cri hae = jee . 
g *, 


But we know that every representation W which is equivalent to @ has the form 

gb CS oS . Hence the characters of isomorphic (equivalent) representations coincide. 
In other words, the notion of the character of a representation is well-defined, i.e. , depends 
only on the equivalence class of the representation. 


We note some more elementary properties of the characters of representations. 


PROPOSITION. Let Xe be the character of a complex linear representation 
(®@, V) ofa group G. Then: 
i) X, (e) = dimV; 
(i) 6 | ) 
(ii) Xe fue) = x (QR ies he Ge vie: or is a function which is constant 


ae} 


on conjugacy classes of elements of G; 

(iii) Xs (g 4) = X, @) for any element ge G of finite order (the bar denotes 
complex conjugation) ; 

(iv) the direct sum 6 = @' + 6" of two representations has character 
X = re te Xgre 

Proof. First, Xe» (e) = tr @(e) = tre = dimV. Next, Xp (hgh) = 


= -l oe 
= tr@(hgh 1 = tr@(h)@(g)@é(h) = tr é(g) = Xe (g). To prove (iii), we note that 
m m 
g = fe. 3 @ (g) - @ ; 
and, if hy» B00 9 . are the characteristic roots of the operator $(g), then 


k : m 
peor? A are the characteristic roots of the operator @(g) . In particular, di eG 
n 


all 
Ac 


-1 -1 -1 -1 = ee 
X_ )=tr@(g ) = tr@(g) = dX ee » Lie (x , = Xq (8) 


1<i<on, and hence Plat a ak Thus, 


Finally, if @ = €' + 9” , we know that for a suitable choice of basis in the 


representation space V_ all of the matrices 7 » & € G, have the form 


and hence tr o. = (rie ®, + tr o - But this means that % (g) = Xg'(8) + X_n(8) : (al 


Note that if n = dim'V = 1, then Xe (g) = @(g), butthatfor n> 1 the 


character Xs is nota homomorphism from G to C. 


Example 1. We consider the natural two-dimensional representation of the group 
SU(2). Let \X be the corresponding character. According to (5) in 81 Ch. 7, any 


matrix g ¢€ SU(2) is conjugate to a matrix of the form 


so that the conjugacy classes of elements of SU(2) are parametrized by the real numbers 


inthe interval [0, 2). According to property (ii) of characters, we have: 


£ 
2 _— 
cos 


-1 
X@)e= Op u es 5) Seca c 


Under the canonical representation @ : SU(2) > SO(3) , the matrix b_ goes to 
7) 


the matrix 


427 


cos¢~ -sing 0 
B= {sine cos 0 
3 0 0 1 


which is also a convenient choice of conjugacy class representative in the group SO(3). It 


is obvious that 
Xs ey = 1+2cos® . (6) 


We shall later make use of the formula (6). 
- G 
The set € ={G- €} ofall functions from G to € has a natural vector 
; G 
space structure over €: if a = € € and Xo X» e€ © , then by ox) + aX 


we mean the function with values 
eG 5X) (g) = @X)(g) + %%X (ge). 


A function in c? is called central if it is constant on each conjugacy class of the 
group G. ‘The central functions obviously form a vector subspace of a? » Which we 
denote Xo (G). Generally speaking, Xe (G) is an infinite dimensional space, but if G 


has only finitely many conjugacy classes Cy; Cc Ge (as is always the case if G 


grrres 


is finite), then the space Xo (G) is finite dimensional. For example, 


Bake) = 10e ie fae) (7) 


eB ea YF 
where 


d if geC,, 


P(g) 
Off end Ce 
By what we have proved (part (ii) of the proposition), the characters of the group G 
belong to the space Xo (G). We shall see that the space spanned by the characters actually 
is all of Xo (G), atleast when G_ isa finite group. 
We now suppose that G_ is finite. We make a? into a hermitian space by 


introducing the scalar product 


428 


Coie = aca » oe)T®, tea . (8) 


It is easily verified that the form (c, T) ® (a, we satisfies all of the properties of a 


non-degenerate hermitian form. Its restriction to the subspace Xe (G)c ¢ is a very 


useful tool, especially for studying the characters of linear representations. 


THEOREM 2, Let @ and W be irreducible complex representations of a finite 


group G. ‘Then 


Proof. In matrix notation we have 
n n 
a yey (8), Xp (e) = 2, v, (2) 


Setting ip =i and td in (4) andthen summing over i and j (in the appropri- 


ate range), we obtain 


o= lel? D>) ey @oye) = Il DO [x 4,0) (5 ey) = 
j i 


g,i,j g 


a els ‘Cree = lel” X_ (g) (8) = Oy Xe) 
ah v ® Po, Y q Xp @'G 


for any non-equivalent irreducible representations @ and W ofthe group G. 


We now use (5) (for ip =i, jo = ps 


wei \ 


me » 6) ain v > OE x) > rc) ; 


-1l -] 
= |G x = 
Ic| 2s, a) <q Cee) 


429 


Since the characters of isomorphic representations coincide, it follows that we have 


Ce = when Ge ap, oO 


The relation (9) is called the (first) orthogonality relation for characters. 
COROLLARY. Let 


ae eee ere) (10) 


be a decomposition of the complex G-space V_ into a direct sum of irreducible G-spaces 
Vi - If W is anirreducible G-space with character wy , then the number of terms 


Vi in (10) which are isomorphicto W _ is equal to Oy 5 XW and does not depend on 


the choice of decomposition (10). This number is called the multiplicity with which W 
occurs in V. Two representations (two G-spaces) with the same character are 


isomorphic. 


Proof. As we have already noted (part (iv) of the proposition), we have 


yu 


Dep. ot , and hence 
1 My, 


icin ne ee a come 


By Theorem 2, the sum on the right consists of k zeros and ones, and the number of 
ones is equa] to the number of G-subspaces V; which are isomorphic to W. But the 
scalar product (X75 oe does not depend on any direct sum decomposition (see the 
definition (8)), so that we have also shown that the multiplicity of W in V_ isa well- 
defined number, depending only on W and V. 

Suppose that two G-spaces V and V' havethe samecharacter X = X = Xr é 


Given any irreducible G-space W, V and V' contain W_ the same number of times, 


namely (xX, We . Hence, if we decompose V and V’' into direct sums of irreducible 


G-spaces: 


430 


we have £= k and vi = WV, 


i 1<i<k (fora suitable ordering of the Vv) Miileneer 


, 


V and V' are isomorphic G-spaces. 0 


The remarks following the proof of Theorem 2 of §2 and the above corollary allow 
us to express the character ae of any complex linear representation (®@, V) ofa finite 


group G asa linear combination with integer coefficients 
s 
x 6 
® me i 


Here m, is the multiplicity with which the irreducible representation (S, ; v) occurs 
in (,V), so that we assume that , af e, for i#j. Using the orthogonality 


relation (9), we may write: 


Z 
(Kelana Do nie ee (11) 


We conclude: the scalar square X ; Xec of the character Xs of an arbitrary complex 
representation @ is always an integer; it equals 1 if and only if @ is irreducible. 


We have arrived at a remarkable result. The characters, i.e., the "traces of 
representations", which contain the scantiest information about each separate linear operator 
@(g), somehow in their totality express all of the essential properties of the totality 


{@ (g) le € Gc} » i.e., the properties of the representation @. 


Example 2. We show that the representations of the groups Ay ; Sy and As by 
rotations of three-dimensional space are irreducible over @. Todo this we must return 
to the corollary of Theorem 2 83 and make use of the formulas (6) and (11). According 


to the description of the representation @ in §3, if 9 isa permutation of order q 5 


then $(g) is rotation about some axis through an angle of k2/q » where 


431 


g.c.d. (k,q) = 1. Hence, the values of the character X = can be computed directly 


from formula (6): 


29 1 die 
eae eemcacce cecal BL, Orie Ln | Le : 


Se 12, 4 Ke = 21) So (k= 2) » vespectively. We note that 


2 mi 
i oy AR a = a we s 
+> = tro ce! 0 Peete 1, ye peawe? 
© 1 


In Corollary 1 of Theorem 4 §2 Ch. 4, we described now to compute the order of 
m from its decomposition into disjoint cycles. The elements are divided into conjugacy 
classes in the tables there (see Exercise 8 82 Ch.7 for Ay » Exercise 4 §3 Ch. 7 for 
S4 » and the proof of Theorem 5 83 ‘Ch. 7 for A.). Here are the same tables, with the 


values of X filled in: 


[= [ones 
eet Le lenin 


iw 
(12354) 


(ee) 


The relations 


432 


2 
(x, X) ofc? s sen? eat + aol = i 
i = 
4 
2 2 2 
Re oe ei Gen A asl = wal | 2 ce 
s, ~ 24 
2, 2 
1 2 2 2 1 een ices 
— e - ry —————— east, = |] 
Ce 13° + 15{-1) + 20:0 +1 5 ) +1 5 ) 


show that the representation @ with character X is irreducible over @ (see (11)). 


EXERCISES 


1. Let @ and W_ be irreducible complex representations of a finite group G. 


Derive the following generalization of Theorem 2: 


xX, (h) 
-1 a ee ena 
Ic | x Xp Ps) % @) = 84g Xp ©) 


Here h isanyelementof G; 6 1 or O depending on whether or not © = fF. 


oy 
2. Apply the irreducibility criterion in terms of characters to the representation 


® of S3 in Example 1 of Subsection 1 §2. 


3. Using Schur's lemma, prove that all irreducible complex representations of an 


abelian group G are one-dimensional, 


4, If t is an automorphism of the group G, then for every linear representation 
(@, V) of G we have another representation (@" » ¥), which is defined by the rule: 
®” (g) = @(T(g)). Verify that this is a representation, and that ca is irreducible when- 
ever @ is. It usually happens that S) «= @, but there are cases when we obtain a new 
representation. What happens if T is an inner automorphism? 

Let G= As » andlet @ be the representation in Example 2. The map 


-] M 
T 7 b (12) #(12) is an (outer) automorphism of A_ which interchanges the conjugacy 


5 
classes of (12345) and (12354). The sets of values of the characters X and as are 


433 


obtained from one another by switching (1 + /5)/2 and (1 - (5) /2. Show that the 


characters X and we are non-equivalent. 


9. Let 6: G-~U(n) and W:G- U(n) be equivalent irreducible unitary 
representations of a finite group G. Prove that there exists a unitary matrix U_ such 


VeeG. 


85. lrreducible representations of finite groups 


1. The number of irreducible representations. In the case of finite groups, the 


above ideas allow us to answer the basic questions of representation theory. One of these 


fundamental facts is the following 


THEOREM 1, The number of irreducible pair-wise non-equivalent representations 
of a finite group G over C@ is equal to the number of conjugacy classes in GC, 


The proof of this theorem is contained in Lemmas 1 and 2, if we note that the 
number r of conjugacy classes in G_ can be interpreted as the dimension of the space 
Xo (G) of complex-valued central functions of G (see (7) 84). Since the characters of 
linear representations are central functions, they span a subspace of Xo (G) of some 
dimension s <r. By Theorem 2 §4, the characters of irreducible representations form 
an orthonormal basis (in the metric (*, *)q) for this subspace. Hence, the number of 


irreducible representations is s. Weknowthat s <r, and so it remains to prove 


LEMMA 1, Let [ bea central function on a group G, andlet (@,V) bea 


complex irreducible representation with character \X,. Then for the linear operator 


¢ 


®. = rh)@(h): V —> Vv 
G = 


(fis the central function defined by setting T(g) = F(g).) 


Proof, Since [ is a central function, we have 


6(g) 02) = D> Fae ene) - 
heG 


— -i -] — 
= D> Twhg )e(ehe )= DY Pe) et) = 4, 
heG teG 
Thus, $,9 (g) = @(g) o, ,» ¥geG. Shur's lemma (Theorem 1 §4), applied to 
c= o, , Shows that o, = }&. Computing the trace of the operators on both sides of this 


equality, we find that 1X, (€) = }dim V = triA@ = tr@_ = aS T(h) tr @(h) = 


heG 
“1 Sapa 
= Ic| 1a Dy %MFa(= Iola. Oo 
heG 
LEMMA 2. ‘The characters x gG0D 9 Xs of all of the pair-wise non-equivalent 


irreducible representations of G over @ form an orthonormal basis for the space 


X@(G)- 


Proof. By Theorem 2 of §4, the set Xporrees x is orthonormal] and can be 
included in an orthonormal basis for Xe (G). Let [ be any central function which is 
orthogonal to all of the x! (X 5 we = 0. Then, by Lemma 1, the linear operator 
a.” corresponding to the representation el) with character x is equal to zero. 


By Theorem 2 §2, every complex representation @ can be decomposed into a 


direct sum 


(Oe (s) 
Ss 


with certain multiplicities Bi coh g Le For the operator 2. defined by the relation 


435 


a= D) Tmem) , 
heG 


we have the corresponding decomposition 


(8) _ 9 


= Oe 
d Se + eae = 


r iP 


In particular, this holds for the linear operator Pp » Where p is the regular represen- 
tation (see Example 5 of §1). But in that case we have (here we temporarily let 1 
denote the unit element of G, soastoavoidwriting e ): 

e 


0 = p.(e,) = oy P(h) pthye, = 28, Phe => Th)=0, vhec , 


from which [T = 0, and hence ig = O) < o 


? 


Example. In the case of the symmetric group S Theorem 1 says that this group 


3 ? 
has exactly three irreducible complex representations. But we already know which ones 
these are: the table at the end of Subsection 1 §2 contains all of the necessary information. 
We note, in passing, that the squares of the dimensions of the representations ei) ; eo) 


ge) satisfy the relation ic ap ie AP Ba =65 Is, . We shall now see that a similar 


equality holds in general. 


2, The degrees of the irreducible representations. We consider the regular 
representation (p, (ee lg eG @ in somewhat greater detail. Let Ry denote the 
matrix of the linear operator p(h) inthe given basis ee le eG}. Since po (h) es = one és 
it follows that all of the diagonal elements of RK for h # e are zero, and tr R, = 0, 


Hence, 


Xe) = Ich, X_tn) = 0, vh#e 


Now let (@, V) bean arbitrary irreducible complex representation of G. By the 


436 


corollary to Theorem 2 §4, the multiplicity with which @ occurs in p is equal to the 


scalar product oo Laeel . According to (1), 


-1 — -1 : 
= lel” YO xm x @ = Isl x0) XS) = 


(BC 5 MA) 
pee heG 


-1 
= Iel Iol xe) = dimv (2) 
We see that every irreducible representation (considered up to equivalence) occurs in the 


regular representation with multiplicity equal to its degree. By Theorem 1, there are r 


pair-wise non-equivalent irreducible representations 


(where r_ is the number of conjugacy classes in G) having characters 


NG) Wyse ae , 


and degrees 


Nyy My yore y DG n, = X,(e) 


(1) 


We usually take to be the trivial representation, so that X, (8) =1l, VgeeG. By 


(2), we have 
and hence 


In particular, 
[eee yee) = ee) eee ne teyeun ae eeme nen: 
p ae ie Ge 1 r 
We have obtained the following theorem. 


THEOREM 2. _ Each irreducible representation 3) occurs in the regular 


437 


representation p with multiplicity equal to its degree ly The order of the group G 


and the degrees Diyere, DL of all of its non-equivalent irreducible representations are 
connected by the equality: 


n= len ia (3) 


For groups of small order, the elegant equality (3) is sufficient for determining all 
of the degrees Diserey OL, although, in general, we of course need additional information. 
It is convenient to write the information about the characters of the irreducible 


representations (also called: the irreducible characters) in the form of a table 


X | ay X (g,) X, (83) dor X, 


X% | Ay X, (85) X, (83) ewe oe 


xX n x, ,) XB) ots xX, @,) 


which is called the character table. The first row of the character table contains represen- 
tatives of allofthe r conjugacy classes eg . For example, the character table for the 


group S3 is: 


o 


(2) = (i223) 


(compare with the table at the end of Subsection 1 82). 


As usual, we let C(g) = CG (g) denote the centralizer of the element g in G. 


438 


We know that Ic(g) | ee = Ic| (see Subsection 2 §2 Ch. 7). Hence, if we rewrite the 


first orthogonality relation (9) §4 in the form 


Xe VTee@yt Tee) : TeT 2 Cte) 


x (g,) Xe (8) & 


eee —— —— | i 
al a lg; | x; (a) %) = Fal 2 Me) Xe) a 


we see thatthe rxXr _ matrix 


x, (g) 


V C(g;) 


M = 


is row-unitary. But a matrix is row-unitary if and only if it is column-unitary (since 


M+'M=E-=‘M-M), so that 


X (g,) xX (8) : 


Dee poe Op 
i C(g;) C(g,) ; 
or, written out in more detail: 


ie Ic, (g)| , if g and h are conjugate , 
DX) Xm) = (4) 
i=l 


0 otherwise 


The equality (4) is called the second orthogonality relation for characters. 


3. Representations of abelian groups. We can now generalize the description in 


Example 6 of §1 of the irreducible representations of cyclic groups to all finite abelian 


groups. 


THEOREM 3. Every complex irreducible representation of a finite abelian group 


A is one-dimensional. The number of pair-wise non-equivalent representations is equal to 
lA| - Conversely, if every irreducible representation of a group A_ is one-dimensional 
se 


439 


then A _ is abelian. 


Proof. Since thenumber r_ of conjugacy classes in an abelian group A _ is equal 
to lA | ; the first two assertions of the theorem follow from Theorem 2 (see also 
Exercise 3 §4). Next, suppose that all of the ns in (3) are equalto 1. This means 


that © = lA | » Wwhichimplies A_ is abelian. 0 


Definition. Let A be an abelian group. The set 
A = Hom(A, C*) 


of homomorphisms from the group A_ to the multiplicative group of the field of complex 


numbers, considered along with point-wise multiplication 


(x, X,) (a) = X, (a) X (a) 


(x cA ,» aé A), is called the character group of A over € (X =X). 
THEOREM 4. ‘The groups A and A are isomorphic. 
Proof. From Theorem 3 we know that, in any case, [Al = |A| . According to 


the results of §5 Ch. 7, the group A has a decomposition 


into a direct product of cyclic groups Ay = (a) (we do not care whether they are primary 


or not; we shall write the group operation in A multiplicatively). If [a, | = s. and 


isa primitive s.-th rootof 1, then eachelement a =a a eae in A 
i 


corresponds to the character Xa € A defined by: 


440 


t t t' te t’ 
7 Cees 
@a-=a a eee ay 1 a 


then there exists an index i with t; 7 ti . Then 


te t 
1 1 
x, @) oe # i x(a) : 


Consequently, the characters 2 are pair-wise distinct, andthe map a S gives the 


required isomorphism between <A and ee 0 


The method of proof of Theorem 4 gives an explicit construction of all of the 


irreducible representations of an abelian group. 


y 


Example. Let V f be an elementary abelian group of order Dee » and let 


2 


X be 


an irreducible complex character which is non-trivial, i.e. » X(a) #1 for some ae V An 


Since every element in V Ps is assumed to have order 2, wehave KerX=B2 
z 


and so we can write V oh BU aB as the union of cosets of B; thus, 
2 


Mie beet) eee nora ae 


For example, the Klein four-group y , whose representations were discussed in 


Problem 2 of 82 Ch. 1, has the following character table: 


a 


2 


n-1? 


The results concerning representations of abelian groups allow us to obtain some 


information on the representations of arbitrary finite groups. 


441 


THEOREM 5. ‘The one-dimensional representations of a finite group G over € 


are in one-to-one correspondence with the irreducible representations of the quotient group 


G/G' (where G' is the commutant of G). The number of such representations is equal 
to the index (G:G'). 


Proof, We first make a general remark. Suppose that G isa group and K isa 
normal subgroup, If @ is arepresentationof G with kernel Ker > K , then we can 


define a representation € of the quotient group G/K by setting 


EGO) = 65a fae 


This is obviously a well-defined representation (see the proof of Theorem 1 §3 Ch. 7). 
Furthermore, Ker @= Ker@/K. In particular, if K = Ker@, then we obtain a 
faithful representation & ‘ 


Conversely, any linear representation W ofagroup H_ induces a representation 


@ of any group G which maps epimorphically onto H: #:G->H. It suffices to set 


@(g) = W(r(g)) Se 


Since # isan epimorphism, it follows that @(G) = W(H), and 6 and W are either 
both reducible or both irreducible. By Theorem3 83 Ch.7, Ker @ = ro Ghar WV). 
Given any one-dimensional representation @ ofagroup G, we have the associated 


abelian (actually cyclic) group Im@, sothat Ker@ > G'. We now obtain the theorem 


as a simple consequence of Theorem 3, the above remark, and Theorem4 §3 Ch. 7. 


4. Representations of certain special groups. Although, in principle, to obtain all 
of the irreducible representations of a finite group G_ it suffices to decompose the regular 
representation (Theorem 2), in practice this is not usually easy to do, and so one tries to 
find other methods. It is usually simplest first to construct the character table, and then 
the representations themselves (in this connection see §1 of Chapter 9). In any case, in 


the relatively simple examples below there is no need to resort to any subtle tricks. 


442 


Example 1. Let G beanarbitrary 2-transitive group of permutations of the 
sett Q={1,2,...,n}, n> 2 (see Example 3 §2 Ch. 7). Further let 6 be the 


natural representation of G inthe space V= (e Gog e with action defined by 


1? 2 
d(g)e, = eat) (see Example 5 81). It is not hard to see that the value Xy (8) coincides 
with the number N(g) of points ice Q (i.e., the number of basis vectors e,) which 
are left fixed by g. By Theorem 3 §2 Ch. 7, we have 


Ly 8) %@ = Dy x8) = D Nie = 2ic] , 
geG geG geG 


which can obviously be rewritten in the form 


X59 Xsg = (5) 


. 


Comparing (5) with the relation (11) of 84 » we conclude that @ isa direct sum of two 
irreducible representations (since 2 = 1+1 isthe only way of writing 2 asa sum of 
squares of natural numbers). But we also know that 6 = a + W, where gD) ,U) is 
the unit (trivial) representation and wW is the (n-1)-dimensional representation on the 


Space W= (¢e ee aera 


1 er, . Ifwe could decompose W_ inthe direct sum 


epee eae) 
V =U @W,, then there would be more than two irreducible terms. We have thereby 


obtained the following non-trivial fact. 


‘The natural complex linear representation (6, V) ofa 2-transitive permutation 
pe eT CaCO pl exe lINe ae be Les entation ee ES 
group G_ is the sum of the unit representation and one other irreducible representation. O 


In particular, each of the groups s. , i = 2% Aw n> 2, has_an G-14dinensional 
irreducible complex representation W, whose character co is given by the formula 


Xy(e) = N@)-1 0 (6) 


As shown before in the case of 53 (see Example 1 of Subsection 1 §2), the 


matrices v, can be easily determined. To compute Xy (s) using formula (6) it is 


443 


enough to know the cyclic structure of the permutation g. Here area few examples: 


A ee (12)4) es). (132) 


(12) (34) (12) (123) (1234) 


alt 1 0 zoel 


A e (12) (34) (123) (12345) (12354) 


Example 2, The irreducible representations of the alternating group Ay - Let us 


gather together all of the facts we know. The group A, has four conjugacy classes. 


4 


Representatives of these classes and the number of elements in them are given in the first 


two rows of the following table: 


3 


1 


(132) 


(123) 


e (12) (34) 


The commutant A’ = {e, (12) (34), (13) (24), (14) (23)} = V, hasindex 3 in A,, 


1 
and so A, has three one-dimensional representations a! ) = % , & = Xo ? 


°°) = X. (with kernel Aj and with ee = 1, ¢ #1) and one three-dimensional 


representation oe) {OVE = ie + ie + ic + 3°) . Comparing the tables for A, in 


444 


(4) 


Example 1 and in Example 2 of 84, we see that the representation @ with character 
X4 is equivalent to the representation ® of the group Ay by rotations (the tetrahedral 


group) and to the representation W whichis connected to the 2-transitive group A, , 


Example 3. The irreducible representations of the symmetric group Sy - The 


first two rows of the table 


(12) (34) (12) (123) (1234) 


1 
2 = X is the unit represen- 


are taken from Exercise 4 §3 Ch. 7. The representation © 
tation, The representation eo?) = Xo is given by taking the sign of a permutation in S4 6 
Since (S, : si) = 2 (the example in Subsection 2 §3 Ch. 7), it follows that there are no 
other one-dimensional representations. The two-dimensional representation ¢) with 
character Xe and with kernel vy <a S, is obtained from the considerations in the proof 
of Theorem 5 and in Example 2 of Subsection 1 83 Ch, 7. The representation oe 
with character x corresponds to the rotations of the cube (see the table for S, in 
Example 2 §4). The representation og) = W with character Xs (see the table in 
Example 1) is the representation which is always connected with a 2-transitive group. It 
is also equivalent to the representation coming from all of the symmetry transformations of 
the tetrahedron A, (rotations and reflections; it is these transformations which are of 


importance in describing the oscillations of the phosphorus molecule -- see problem 2 of 


APD laa ih). 


Example 4. The irreducible representations of the quaternion group Qs 7 EVERY 


445 


thing we need to know about Qs was discussed in Example 2 of Subsection 5 83 Ch, 7. 
In particular, we described the two-dimensional irreducible representation ae with 


character Xs (of course, at that time we did not use our present terminology). 


: : €. 2 ; 
The four one-dimensional representations have the commutant (a ) as their kernel, and 


are determined from the table in the example in Subsection 3. 
EXERCISES 


1. Derive the relation (4) by explicitly writing out the expression ty S > Yo 
for the coefficients in the expansion p = 2 ty x of the central) function a (see (7) 


J 


84) in terms of the irreducible characters. 


2. Recall the isomorphism between a vector space V andthe dual space V*, 
and also the natural identification of V withthe double dual (V*)*. Verify that the map 


Rg 
tT:A 7 A defined by 
a” (X) = X(a) , 


2 
gives an isomorphism between the abelian groups A and A. ‘This exercise, along with 


Theorem 4, gives part of the so-called duality law for finite abelian groups. A similar, but 


much deeper duality law for topological abelian groups, which led to important consequences, 


446 


was established in the 1930's by L. S. Pontryagin. 


3. Prove that, if a finite abelian group A _ has a faithful complex irreducible 


representation, then <A _ is a cyclic group. 


4. Let A _ bea finite abelian group, and let B bea subgroup. Prove that any 
character of B extends toa character of A, and that the number of possible extensions 


is equal to the index (A:B). 


5. Justify the sentence before the parentheses at the end of Example 3 in Sub- 


section 4. 


6. What is the average value ici eS X(g) of a complex character X onthe 
ei 


elements of a finite group G? 


7. Gather together the various tables concerning A. (see Example 2 in Sub- 


5 


section 2 §4, Exercise 4 §4, and Example 1), and make up a character table: 


(12) (34) (123) (12345) (12354) 


Describe the irreducible representations with characters X) j X» 9 Xg , and x, - Fill in 


the last row of the table by using the second orthogonality relation (4) for characters. 


i k 
8. Let P={A BIC 30 < i,j,k < p-1} be the group of order = in 


e _,),, bea complex vector space of 


i Pele = 
Exercise 3 §2 Ch Let V Cer esyeeey p-1' 


447 


dimension p; let € beaprimitive p-th root of 1; and let G,8,,C, be the linear 


Operators on V_ defined by the relations 


Ge, = 


aes oe koi i is cs oe 


(the subscripts of the basis elements are taken modulo p). Show that the map 


mea t— a, eral Cc a. 


defines an irreducible linear representation of the group P. The representations 


i oil 
e! ) dao % @!? ) are pair-wise non-equivalent and, together with the oF one-dimensional 


5 


representations (p~ is the index of the commutant P' = (C) in P), exhaust all of the 


complex irreducible representations of the group P. 


9, Carry out the computations necessary to complete the following argument. Let 
é 


a= {a,b | a” =O, bb 28, bab = aS be the dihedral group of order 2n, 
whose properties (including a description of the conjugacy classes) were given in Example 1 
of Subsection 5 §3 Ch, 7. Since (a) 4 Di it follows thatthe maps ah 1, bbl 
eo ti/n 


and aW&1, b -1 give two one-dimensional representations. Let € = be 


a primitive n-th rootof 1. Then the map 


( eJ 0) 0 1 
oe _it, bo Fb 
0 «4 0 
(j) 


defines a representation of degree 2. The representation © is irreducible for 


j=1,2,..., [(n-1)/2] (where [@] denotes the greatest integer less than or equal to 


(m) 


areal number @). If n= 2m, the representation © splits into a direct sum of 
two one-dimensional representations: a ®-1, b®1 and ab-1, bw -1. This 


agrees with the fact that the commutant Dom has index 4 in Do, and 


D, /Di = “Z, x Z,. All of these representations are irreducible, and they make up a 
2m° 2m 2 zs 


complete set of complex irreducible representations of the dihedral group. Find a realization 
Gj) 


of the representation © over the reals. Give an explicit isomorphism (showing equiv- 


448 


alence) Po ~) 6) for k >m, for suitably chosen j <m. 


10. The crystallographic groups (see Problem 2 in §2 Ch. 1), Let E be 
n-dimensional Euclidean space, and let V_ be the associated vector Space with the usual 
scalar product. Every rigid motion d of E corresponds to an orthogonal linear 


transformation d ¢ O(n), and we have did, = a A group D_ of rigid motions is 


2 
called a crystallographic group ifthe D-orbit of any point is discrete (i. e. , does not have 


any limit points), and if there exists a compact set M ¢ E_ for which 


D(M) = U  d(M)=E. The Schoenflies-Bieberbach theorem states that for any crystal- 
deD 


lographic group D there exist n independent affine transformations which generate a 
normal subgroup L in D, where D= D/L is a finite group (a crystallographic point 
group). When n= 3, there are in all 32 geometrically different crystallographic point 
groups. Of course, among these groups are the groups containing reflections (improper 
rigid motions). It follows from the conditions defining a crystallographic group that every 


proper rotation in D is given by a matrix which is similar to 


cos@ -sin@ O 
A = ll sin 9 cos@ 0 
0 0 1 


andhas trA=1+2cos@e Z. Using Theorem 2 §3 and the preceding observation, 
show that for n= 3 the only crystallographic point groups without reflections are the 


cyclic groups C, 5 Co, Car Cres oF , the dihedral groups Do; D the 


gy) a! D,,D 


ae @? 


tetrahedral group T and the octahedral group O. 
§6. Representations of SU (2) and SO(3) 
The concrete images of representations of SO (3) play a key role in the study of the 


physical world, since the action of SO(3) reflects the symmetry of many problems of 


physics. From a mathematical point of view, the action of SO(3) is of special interest in 


449 


part because it induces an action on the space of solutions of the differential equation 


2 2 


2 
where A = 5 + — aP 2. is the Laplace differential operator. The two- 
Ox ay az 


wt = 0 


, 


dimensional analog of this was considered at the very beginning of the chapter (Problem 1). 


Any element in SO(3) is a product of various operators Bs and Cy of the 


) 


Hence, the invariance of the equation Af=0 relative to Ly and Cy follows from the 


form (1) in 81 Ch. 7. But Be does not acton z, and C does not acton x. 


computations which were made in the two-dimensional case. We conclude that the equation 


Af = 0 is invariant relative to all of SO(3), ie., 
SiO ——— ae = 0, Vg e€ SO(3) , 
where ee is the function defined by: 
-1 -1 -1 
er Cone) = f(g (x), & (y), g  (z)) ° (1) 


= : 3 
By assumption, if ¢g : is an orthogonal transformation with matrix (2) , then the 


column of the new variables has the form 


aoe) a a a x 
ah ileus als 
NS ey ee | ae 
g (2) ea ape aa! a 


According to (1), 
f y= @ Hig lw, g (y), g (2) = 
ly se) Cs ee = (&, f) (g fe PAY) ok 


tte tiny, a lig ty, ne) = 


iI 


(gh) (x), (gh) *(y), (BM) @) = Oy Gyy2)- 


Hence, 


i.e., the linear operators , g € SO(3), act on functions in such a way that the map 


450 


@:gbh > is a representation of the group SO(3). This is a very natural method for 
constructing representations (which we actually used before when considering symmetric 
functions under the action of the group s) in principle, this method is suitable for a 
large class of groups, and it is typical of the techniques used in functional analysis. Starting 
from certain concrete conditions, one need only choose a suitable space of functions and 
then decompose it into irreducible invariant subspaces (this is a problem in harmonic 
analysis). 

In the case of the group SO(3) , when all of the irreducible representations are 
finite dimensional (this is true generally for compact groups, but we shall not treat the 
theory of compact groups in this exposition), we take for our space of functions the homo- 


geneous polynomials 


SieEeeilesi=t 
LG Gy) y ae t* y Zz 
? 
s,t 


of Tredidertce Sime (m — 1) 2.3 These polynomials form a space ee of dimension 


ce a (see Exercise 4 §2 Ch. 5). Since Afe Eee , it follows that the condition 


Af = 0 is equivalent to oS linear conditions on the coefficients 4a it . The solutions 
fe a of the equation Af =0 are called homogeneous harmonic polynomials of degree 
m. Since the operator A is linear, the harmonic polynomials form a subspace Hn of 
dimension equal to Ge e a = i) = 2m+1 (at this point we can only say that its 
dimension is < 2m+1, but in fact we have equality), By what we have said, Hn is 
invariant under the action 6 = a of the group SQO(3). It turns out that we have the 
following fact: the space Hn of the representation a™) is irreducible over € » and 
any complex irreducible representation of SO(3) is equivalent to one of the representations 
e™ ‘ Hw having odd dimension 2m+1. Rather than prove this theorem, we shall 
proceed to the group SU(2) » Where it is somewhat easier to obtain a family of irreducible 
representations. Because we have a natural epimorphism SU(2) ~ SO(3) whose kernel 


consists of the matrices +E (see §1 Ch. 7), every representation W of SO(3) can 


also be considered as a representation of SU(2) (see the proof of Theorem 5 $5), which 


451 


satisfies the so-called parity condition; Ve =W.. Of course, this means that we also 
have ae = u, for all ge SU(2). Conversely, any representation W of SU(2) which 
satisfies the parity condition can be considered as a representation of SO(3). The "double- 
valued” representations of SO(3), i.e., the representations of SU(2) which do not 
satisfy the parity condition, also have physical meaning. For example, the usual two- 
dimensional (spinor) representation is of this type. 

We further note that any irreducible representation of SO(3) other than the trivial 
one is faithful, as immediately follows from the fact that SO(3) is simple (Theorem 6 


na Ch. 7). 


THEOREM 1. Let V_ = Gg Whe = (Rl verereas n) 9 be the space of homo- 


geneous polynomials of degree n in two complex variables with an action yf of SU(2) 
defined by the rule e 


(n) 


(Wf) Gy) = f(@x - By , Bx + ay) 


for every element 


Then fe V_) is an irreducible (n+ 1)-dimensional representation of SU(2). If n 
ane fay ee eee ee as 
is even, then me vo is also an irreducible representation of the group $O(3). 


Proof. Suppose that the polynomial 


: k_n-k 
f(x,y) = De a,x y 2 
k=0 


is contained in some invariant subspace U © We . Then also 


sfigde 
n in 
ae e Pha ay ee [x (oye aU ay 
= fc) 


452 


where b_ is anelement of SU(2) of the form (4) §1 Ch. 7. Since © isan arbitrary 
$ 

real number in the interval (0, 2m), we can make up a linear system with Vandermonde 

determinant, from which it follows that 


-k 
f(x,y) « U = oe e U (2) 


i keene k 
for any monomial with coefficient ay #0. Butif xy ¢ U forsome k, then also 


(n), k n-k 


ak gn-k m : (aay, ) 2 YU . 


B X fee = (ax - one (Bx + ays 


Takinga g with #840, from (2) we conclude that x eU , Which, in turn, gives 


us 
n 
YD (2) wea x8 oS cv ; 
aeRO 


Since (3) a? (-p) > 20, we have Soy el, sO ee ee ose ae 


(n) 


and we have proved that (f°, ve is irreducible. 


Next, we have 


Ti) a Koen = n x ae k 
Tea) = Geren = i) 
: a (2m) 
so that the parity condition holds for n= 2m (see the remark above), and (v , vo 
can be considered as an irreducible (2m + 1)-dimensional representation of SO(3). oO 


(2m) , 5 ; (m) 


Actually, wW is equivalent to the representation © of SO(3) onthe 


space of homogeneous harmonic polynomials of degree m, but we shall not prove that here; 


(n) 


nor shall we show how to choose a basis of ue in which the representation wW becomes 


unitary (this can be done), We shall only note that, borrowing the terminology of tensor 


(n) 


analysis, the representation W of SU(2) can also be realized in the class of 
coinvariant symmetric tensors of rank n, A complete and transparent theory of the 


representations of compact groups, including SU(2) and SO (3) , is usually given using 


the infinitesimal method, based on the correspondence between Lie groups and Lie algebras, 


453 


EXERCISES 


1. Construct 2m+1_ linearly independent homogeneous harmonic polynomials 


of degree m. 


2. Show that any homogeneous polynomial fe Pa can be written as a linear 
combination of harmonic polynomials of degree m,m-2,m-4,... with coefficients 


which are a function of - + va + eS Q 


3. Use Exercise 2 to show that every polynomial function % : (X,Y,Z) * g(x,y, z) 
Zee 2 2 6 : f 
onthe sphere S :x +y +2Z = 1 canbe decomposed into spherical functions -- the 


ae : ; 2 
restrictions of harmonic polynomials to S . 


4, Without using the complete description of the irreducible representations of 
a 


$O(3), show that the only homomorphism 7 :SO(3) ~ SU(2) is the trivial one. 
87, Tensor products of representations 


1. The dual representation. Let (@, V) bea complex representation of a group 
G. Weconsider the dual space V* (the space of linear functions on V) and set 
(@*(e) +) = f@@v); fevt, vev . (1) 
We immediately verify that ©*(g) isa linear operator. Next, we choose dual bases of 
V and V*: 


me C= oy 

The matrix of the linear operator *(g) inthe basis f) soot 9 = is the transpose of the 
=) Oe : 

matrix of the linear operator @(g ) in the basis ey sooo © 


ve 


@* = “® 0 (2) 
g 


454 


Since 


t t t t * ak 
@* = © = @— = {@ 6 Z| SH) . ) 4 S ava 2 
gh bl h et a 1 Fe e i h i Gehl 


it follows that the relation (2) (or (1)) defines another linear representation (6*, V*) of 

G; it is called the dual (or contragradient) representation corresponding to (@, V). It is 

not hard to see (for example, from (2)) that (@*)* ~ @. It is possible for representations 

which are dual to one another to be equivalent. For example, if (6, V) isa real ortho- 
eal 


gonal representation, then Ca ae = wee But the representations @ and @* are 


usually not equivalent, as we can see from the simplest example: 


Chee (ala? =e); (a)= 6, 6*@) =e (e2+e+1- 0). 


If G_ isa finite group, then we can obtain a precise criterion for a representation 
and its dual to be equivalent in terms of characters. Since the characteristic polynomials 


of the matrices A and Ey coincide: 
t t 
detQ’E - A) = det (,E- A) = det(.E - A) ; 
it follows from the elementary properties of characters (the proposition in §4) that 
Xx (8) on (eo: 


In particular, a representation @ whose character takes only real values is equivalent to 


®*. Note that, since we always have 
(Xp - »G = Xe? Xp co : 
it follows that © and ®* are either both reducible or both irreducible, 


2. ‘Tensor products of representations. The following fact is normally proved in a 


course in linear algebra or algebraic geometry (see also Exercise 1 below). 


THEOREM 1. ECO, and W_ be vector spaces over a field K. Then there 


455 


exist a vector space T over K_ anda bilinear map 7: VxW- T satisfying the 
following conditions: 


CTL) if Vee Ves V_ are linearly independent and w w,é«W, then 


k i ie me 
k 
a eS Oe Os 
i= 1 
(ez) AE Wiser WL EW are linearly independent, then DE T(v,, Ww) = 0 = 
i 
¥. =O ‘a = OF 


(T3) tT is surjective, ie., 


T = (rv,w)lveV, we W), ‘ 


In addition, the pair (7, T) is universal in the sense that, if (r’,T') isa pair 


consisting of a vector space T' anda bilinearmap 7’: Vx W- T', then there exists 


a_unique linear map o:T-~ T' suchthat 7'(v,w) = o(T(v,w)), ve V,wewW. O 


If we had two such universal pairs (T,T) and (T', T'), then we would find that 
the linear maps c:T-~ T' and o': T' ~ T would actually be mutually inverse 
Thus, in thatcase T= T', and the 


isomorphisms: o'o 6 =e gog' =e 


Ae plies 


isomorphism o:T~- T' has the property in the theorem. 

The pair (tT, 7), which is uniquely determined up to isomorphism once V_ and 
W are given, is called the tensor product of V and W. Wewrite T= V ole W, or 
simply T = V ® W, but we must also keep in mind that the space T is accompanied 
with a bilinear map (v,w) * v® w from VxW to T_ which satisfies conditions 
(T1)-(T3). Thus, the elements of the tensor product V®@ W = are the formal linear 


combinations of ordered pairs v®w (veV,wewW) with coefficients in K. Here the 


following relations are fulfilled: 


456 


a = @ = 0 
Wt) Sy vo v, @w 5 


= = = 3 
v @ (w, + w,) v@w, yea, @ , (3) 


" 
S 


AV@w-v®@iw 


(A(v ®@ w) =-1v ®w = v® iw) - 


It is immediately clear from Theorem 1 that the bijective maps v®w PRw®vy 4 
(u@v)®wuS(v@w), and v®@A/A®v»jVv_ give isomorphisms (called 


canonical isomorphisms) between the vector spaces: 


V@W=Wev, 
USV)®@W=UB8(VeW), 


i VOK=K®EV=V . 
We also have the following distributive laws: 


(U®V)®W = (US W)S(V8W), 


US (V@W) = (US V)S(U 8 W) 


In tensor analysis, where the above ideas originated, one studies tensor products of 


the following special form: 


We sco WF OWE 4.40 WV . 
eee eee ee” 


Pp q 


The elements in such a tensor product are called tensors of type (p,q), p times 


covariant and q times contravariant. If we choose dual bases Cyrree ge in Vand 


i i 
ay 
e foe: in V* , then the elements e @...8 2° Be, serra form a basis 
1 q 


for the space of tensors of type (p,q). We usually think of a tensor as simply the set of 


Iocel 


coordinates if q 


i in this basis, where we have rules for change of coordinates when 
eee 


passing from one basis to another. In this way one obtains the interpretation of such notions 


457 


as a bilinear form and a linear operator in the language of tensors (in effect, in the language 
of matrices). We shall not dwell further on such matters, since our purpose is to discuss 
representations in the general situation of tensor products. 


Let @:V-~V and &:W-W be linear operators. By their tensor product we 


mean the linear operator 


CSR:VeaW—-——> VeW 


, 


which acts according to the rule 
(@R)W®w) = GV@RBWw (4) 


(and extends to allof V® W by linearity: (@ ® g) (= v, ® w,) at DD, Gv, ® Bw) . This 


definition is clearly compatible with the relations (3). For example, 


G(v, +v,)®@@w-Gv) ®@w-CGv, SRW = (Qv, +Gv,) ®@gRw-dv, @Rw-Av, @Rw = 


Hence the actionof G€®R on V® W _ is correctly defined. We also note the following 


relations, which follow directly from the definition (4): 


(@®e)C @8) = ac @aa, 
G+C)®e8 =-G®R +COR, 
C@@t+C)=- CSR + Ge, 


G@ig = >40@R = AG@R) 


We leave their verification to the reader. 
As before, let V = Ce), Sheen e. and W = (fh, Fok, ie) . We obtain an 


nmxXnm matrix for the operator @@8, which we denote A ®B, in the basis 


Oy Sener yy ee €,@ fr, e, ef peep € Op ..., 6 St | 


if we note that 


458 


= = if 
Ge, 2 Hy Ons Bt pe B., i 


= ¢ U9 tn 
(G88) (e, ®t) p> 14 Boy Si 
? 


Hence for A = (a4) and B= (B.5) we have 


ll 12 1n 
a, B 
21 22 2n 
= (a = 
8 ( i Bpry? Do oe 665506 we S 
a 
ale 128 nn ® 
In particular, we have the trace formula 
trAS@B=Q@ trB+@ trB+-+-+Q@ trB=trA¢trB., (S)) 
ll 22 nn 


We note in passing that 
det A ®@ B = det(A 9 EW (E @ B) = det @ EW C detE ® B)= (det Ay (det B)" 5 


Now let (@, V) and (WY, W) be two linear representations of a group G_ with 
characters % and Ay » respectively. We define the representation (6 ® ~, V ® W) 
in the natural way, by setting 


( ® W) (g) = &(g) @ Wg), WEE@ ¢ 


The general properties of the tensor product of linear operators, along with (5), imply that 
the map @®wW actually gives a representation of G with representation space V ® W 


and character 


“eoov es - (6) 


We shall call (6 @ YW, V®W) thetensor product of the representations (6, V) and 


(Y,W). If Y= 6 and W=V, we call it the tensor square of the representation 


(®, V). On the right in (6) we have the usual point-wise product of the central functions 


459 


xe and) 6X . 
} W 
If U isa G-invariant subspace of V, then it is obvious that U®W isa 
G-invariant subspace of V@®W. The analogous remark applies to G-invariant sub- 
spaces of W. But irreducibility of V and W by no means implies irreducibility of 


V ®W, as we can see by the example of the tensor square e) ® ge of the two- 


dimensional representation of S 


Om 


3 (see the table in Subsection 2 85). In fact, 


dim | } Qe ; and the maximum possible dimension of an irreducible represen- 


tation of S, ise 


The problem of effectively describing the irreducible representations contained in 


One (p) 


@® WW, or more generally in © ®...®@°°", is fundamental, since many 
important and very natural group representations arise as tensor products. It is from this 
point of view that one should consider the representations of the groups SU(2) and SO(3) 
(see §6), andalso Examples 3 and 4 of Subsection 2 $1. The invariant subspaces of 
symmetric and skew-symmetric covariant (or contravariant) tensors occur constantly in 


various geometrical applications. This problem Is especially attractive when we have a 


complete reducibility theorem for the representations under consideration. 


3. The ring of characters, For simplicity we limit ourselves to the case of 
1 2 r 
6! ee) (r) 


complex representations of a finite group G. Let 5 “pane 4 be a complete 


set of pair-wise non-equivalent complex irreducible representations of G, and let 


x 5s x. be the corresponding characters (r is the number of conjugacy classes 


grtety 
in G). We know that the representation @ ® WY, like any representation, has a 


decomposition 


SOY ~ mo Hae mo? ; 


where the multiplicities m, only depend on the representations @ and wW whose tensor 


product we are studying. By (6), we have 


460 


ee mx: 


Let Xo (G) be the set of all possible integral linear combinations of the characters 
X proes Xie Earlier we proved that Xo a) x. form an orthonormal basis for the 
space Xo (G); hence, in any case, Xo (G) Cc Xe (G) isa free abelian group isomorphic 
to Z" with generators % poeo g = . We call the elements of this free abelian group the 


generalized characters of the group G. The only true characters are the linear 


combinations 2 m, % with all of the m, nonnegative. 


From all this it is clear that tensor product of representations induces a binary 
operation on Xo (G) which is commutative and associative and satisfies the distributive 


laws. To summarize, we have the following 


THEOREM 2, ‘The generalized characters form a commutative associative ring 


Xo (G) whose unit is the trivial character % : 


We say that Xe (G) is a commutative associative algebra of dimension r over 


@. The structure of the ring Xo (G) or the algebra Xo (G) is completely determined 


by the so-called structure constants -- the integers m* in 
ee eer ea 1) 
XxX =o my 7 
1 Te * e) 
, : k k k 
In particular, the equalities af = a and cf = by reflect the properties that 


Xo (G) is commutative and % is its identity element. According to (7), we have 
%@%@ = Dmkx@, vec 
i j ij a : 


if we multiply both sides of this relation by (ie Ii Xe (g), sumover geG and use the 


first orthogonality relation for characters, we obtain 


s 


1 —— 
mi = TET 23 XX) X@) (8) 


Thus, the structure constants can be expressed in terms of the characters themselves. 


461 


From (8) we can derive the following simple fact: 


eee = 2 - 
m= ToT 2 X, @) %, (@) Xj (@) = TeT y X, (8) % @) = 


iT] 


L Y (oe) * 
TeT py XB) H@) = Oe» 


where % = XgG» is the character of the dual representation of 3) (see Subsection 1). 


(i) 


@ ge —” stand omy it 6 is 


Thus, the unit (trivial) representation occurs in © 
G) _ gs. 


equivalent to the representation & (since otherwise mi, S X» Le = 0). 


i 


(i) 


We further note that the tensor product of a one-dimensional representation © 
and an arbitrary irreducible representation go) 
9) 6 


e 


is always an irreducible representation 


This can be seen in many ways, for example, from the 


having the same dimension as 


@ 


criterion for irreducibility in terms of characters. Wewrite X = X (i) 
6 SS 


where x (g), since itis a rootof 1, satisfies x (g) % (g) = 1; hence, 


1 ae 1 Ss 2 
(% %g = ToT ye X, (@) %, (@) X; (8) X%) (8) = JET y X (8) X%)@) = Or %)g =} 


Example 1. G Sg (see the tables in Subsection 1 §2 and Subsection 2 §5): 


(1) (3) (2) 


~ @ @ 6) 


(3) 


@° 86 Le ® 


Example 2, G = S, (see Example 3 in Subsection 4 §5): 


(4) _ (5) ee) ey) 


an f,; ¢' S86 ' +6 


Finally, we prove the following curious theorem, which serves as a generalization of 
Theorem 2 §5 on the decomposition of the regular representation. 
THEOREM 3. Let X= Xs be the character of a faithful representation (6, V) 


of a finite group Gover the field of complex numbers @. Suppose that XX takes 
precisely m _ distinct values on G. ‘Then every irreducible character x occurs with 


462 


non-zero coefficient in the decomposition of at least one of the characters 
0 2 mel : ; : : 
us % 5 tg gacog . In other words, every irreducible representation occurs 1n 
@i ; : 
at least one of the tensor powers @ =$@...®@6, Oi = a> where ise 
faithful representation. 


Proof. Let we) = x(g)) pea Oe yee, ma 1 Derthe distinct values (akensby 


X on G, where Wy = x(e) = deg @. Further let 
G, = {ge |X) = Xia) = ot 


Since ¢@ is faithful, we have 


Let X be an irreducible character of G which does not occur in the decompo- 


sition of any of the characters x’. Then 


m-l ; P 
Ee i Ss i ree i ; 
te [ele = 2, ely 2, a = >», Ty Dede) 
J= (a) eats Ga 
J 
is a homogeneous system of linear equations inthe T, = gS Xe (g) with determinant 
geG, 
J 
ik 1 eee 1 
; Wo wy . oe 
det (w;) = te Scereee sees ae a 
pe! at m-l 
|%o 1 ods aot 


which is non-zero (since it is a Vandermonde determinant). Thus, T. = 0 
1 =O, doooeg tao, he, 
Do &(e) = 0 = 0 
mac = 0, j} =] Oy Nessa, ba ol 


é€G, 
2 i] 


In particular, 


463 


2 ie Sa 
B «Go 


This contradiction proves the theorem. ce 
In the case of the regular representation we obviously have m = 2, 


4. Invariants of linear groups. As usual, by a linear group of degree (or dimension) 
n we mean a subgroup of GL(n,K), where K _ isa field. In what follows we shall take 
K=R or @. If G isan abstract groupand @:G~ G(n,@) isa linear represen- 
tation, then we shall also call the pair (G,@) a linear group. The linear transformations 


e, act on columns: 


te (x,) x) 
: = @ z 
. gi. 
eee) x 


These operators take any form (i.e. , homogeneous polynomial) f of degree m_ into 
another form of degree m: 


@ yyy %) = £6 


g 


Sal 


ce (x Jyeeey @ 1%) 
& 


We have already encountered some special cases of this action (see 86). The map ® 
gives a representation of the group G_ in the space ae of forms of degree m over @ 


(i,e., in the space of covariant symmetric tensors of rank m). 


Definition, Aform fe Ee which remains fixed under the action of ® (ie., 
af =f, YgeG) is called an (integral) invariant of degree m ofthe linear group 
(G,@). 

Actually, in the general theory of invariants one takes a degree m polynomial with 
coefficients in “generic form" which remains fixed under the action of @(G) ; but for 


simplicity we shall use the above definition. Ifwetake f to bea rational function, then 


464 


we arrive at the notion of a rational invariant. We also have the important concept of a 


relative invariant f, where we have 


@f=-wf 
g go 


with ww ¢@ a factor which depends on the element geG. 
g 
It is clear that any set of invariants {f), fo, ...} ofalinear group (G, &) 


generates a subring of invariants C[f ] in E[X), +++, x Ne 


eee 7 


We now consider a few examples. 


; 2 2 2 : é 
Example 1, The quadratic form X) ap Ky ep DOO Sp xi? along with all polynomials 


in this form, are integral invariants of the orthogonal group O(n). 


Example 2. The elementary symmetric polynomials 


s(x), p00 F x) gop 4 s(x ao y x) are integral invariants of the symmetric group Su 


ee 
which we consider along with its canonical monomorphism ®©®: = -~ GL(n). The 
fundamental theorem on symmetric polynomials states that the invariants Seat EL of 
degrees 1,...,n, respectively, are algebraically independent, and the polynomial 
functions (rational functions) in these invariants exhaust a}l of the integral (respectively, 
rational) invariants of the group (S_) Ge) 6 

The skew-symmetric polynomials are relative invariants of the linear group 
(Ss ©): of = (det $f = ef . We have seen (Exercise 3 §2 Ch. 6) that any skew- 
symmetric polynomial f has the form f = 4‘ Om, Where = oe : (Gy, 2 x) » and g 
is an arbitrary symmetric polynomial, i.e., an absolute invariant. 

Example 3. For the representation oF gO lex iN So of degree fe of the 
general linear group GL(n,K) with representation space M, (K) (see Example 3 81) 
we have the following set of n algebraically independent invariants: the coefficients of the 


characteristic polynomial of the matrix X = (x,) . In particular, this set includes the well- 


known invariants tr X = > Xi and det X. 


465 


Example 4. The orthogonal group O(n) acts as follows on a quadratic form 


eh 
_~ 
rs 
te 
~ 
. 
> 
~ 
ba 
~ 
Il 


4 Prams t 
zs Dy ay xX : » Which we write in the form f(x, Bee x) = XAX, 


Ceo) => (c'n Crea (ees) = X CAC R= XC ACK 


In this case it is customary to speak of the invariants of the quadratic form f relative to 


O(n): tr A,..., det A. In the case of the binary quadratic form ee + 2bxy + os 


, 


the invariants a+c and ac-b which distinguish second degree curves which are 


, 


metrically different, are well known from basic analytic geometry. 


Example 5. We consider the symmetric group S, asa linear groupof degree 2 


(3) 


3 


by using the following representation [ , whichis equivalent to © in the table at the 


e 


end of Subsection 1 §2: 


2 
eee eile ig = ; € +eée+1= 0 
(123) 0 Bee (23) ‘ne 


(we obtain the equivalence by means of the conjugation 


é 0 € 0 
3 
5°) 


Go viiee alos lal 


Let u and v_ be independent variables which a transforms linearly as follows: 


-l zi = 
Pay) os Baya ee ee a (a oe AN) 7 
Since 

a (uv) = ri (ee (v) = ee *€V = uv 

(123) 923) (123) ’ 

ieee (uv) = vu = uv, 
aot a) = (€ u) + ies = i 4p v, 
ey a) = a + te = ae + = ; 


466 


it follows that the group (S, ,T) has as invariants the forms 


3 3 
= Oe, Ce a (9) 
of degrees 2 and 3. 
Next, S3 acts naturally on polynomials f(x); Xo» Xs) in three independent 
variables: 
(Gi Ce 5 Sq Sy) Se TER x 6 a - 
he wy? 8 = ere - 2 
o  () « (@) sa @) 
If we set 
S é eo =x, + Bo + ex (10) 
Wes OP pe X35 eS Ey 2 EX, 5 
we see that 
2 
Pew = x) + ex | Te 
o (1) o (2) o (3) 


In particular, 


2 2 
= = ie = = 
T4193) X, + EX) + € X, Sw , P93) ) x) + EX, +e X, V 5 
if (v) = x, + Bn + 6X, = wee Ip (v) = x, + Be + = 
Cg = ieee eee ORS is eae ia 
i.e., the actionof TF on u and v andthe actionof ¢ on X),X5,X_ are 
of 


compatible. If we perform the substitution (10) in the invariants (9), those invariants 
become symmetric functions in the variables Xp» XyoXqy which, by Theorem 1 of §2 


Ch, 6 , can be expressed in terms of the elementary symmetric functions s, = s, (x 9% ) 


It is a simple exercise to show that 


2 2 2 2 
= (3 = ~ 
I= X) +X, +X, + (e+ ae 1X2 t *y Xq + X,X4) = Ss) 3s, ; 
a 2, 2 2 2 3 
I, = Bee Sx, 4%, )- B(x, ot Xy X3 + X)Xo + Xj xX, + XKq + XHX, x ae 12x |XX, = 28) -9s)8,+278,. 


X, to be the three roots of the cubic 


We specialize IL and 1, by taking X19 Xp 3 


equation 


467 


3 
x ap foe ee @ = 0 
Then ane 5, =P, and S, = "4, so that 
7 = -3p, I, = -27q - (11) 
But it follows from (9) that 
3 
ny. een 
~ a F rg ET, = 
u 


The radicals are chosen in such a way that, after substituting the values (11), we obtain the 


formulas 
— 2 3 \-3D > ~ 3/27 3 
u = in =) 1D). v= nos SeyD) Wie iy 
: F 3 Ze : rane : ? : 
in which D = -4p - 27q is the discriminant of our cubic equation (see (16) in §2 


Ch. 6). Since we now know u and v, we can find the roots themselves from the linear 


system 


ea ane Ses 
ee Bape as SA 
x, + x, + X, = 0 2 


We have obtained in a natural way the formulas of Cardano which were mentioned in Problem 
Wot Meee Che ly 

The connection in this last example between the invariants of 52 » Which is the 
Galois group of the general cubic equation, and Cardano's formulas isno accident. Toa 
certain extent Galois theory is concerned with the study of invariants of fields (and their 
corresponding groups) which are generated by roots of algebraic equations, 


We mention some facts about generators of the ring of invariants. Let w be an 


468 


arbitrary form in n independent variables Xprcer a Xe A finite group G _ with an 


n-dimensional linear representation @ acts as a permutation group on the set 


ae {6 (w) lg e G} 


It is clear that any homogeneous symmetric function of key (or a divisor of ke l) 
variables which take values in Q isan invariant of the linear group (G,@). If we now 


let w be the variable Xi, then x, is a root of the algebraic equation 


sul (- 6 (x) = 0 5 
geG ae 


whose coefficients are invariants of (G,@). Thus, each variable A is an (algebraic) 
function of the invariants. If there were fewer than n algebraically independent invariants, 


then we would be able to express x x in terms of fewer than n algebraically 


pret 
independent quantities, and this is impossible. We have thereby proved (if the reader 


accepts our rather bold use of the properties of algebraic independence) the following 


important theorem of invariant theory. 


THEOREM 4. _ A finite linear group of degree n alwayshasa setof n 
algebraically independent invariants, 0 


The forms (9) are such a set of invariants for the group (S, alee 

We could have included in Theorem 4 the fact that the full ring of integral invariants 
of a finite group of degree _n_ is generated by n algebraically independent invariants 
f, 40004 Ls and, as a rule, one more invariant ee (which is an algebraic function of 
the first n invariants). In other words, all integral invariants are polynomials in 
f - This fact holds for many other linear groups, both discrete and 
continuous. 


The general theory of invariants, which developed in the middle of the XIX century in 


the work of Cayley, Sylvester, Jacobi, Hermite, and others, and then experienced a second 


469 


birth in severa) fundamental works of David Hilbert, has in modern times become a part of 
algebraic geometry and the theory of algebraic groups. The continual interest in the theory 
of invariants is partly explained by its wide applicability in many areas of physics and 


mechanics, 


EXERCISES 


1. Prove Theorem 1, following the outline below (the notation is as in the statement 


of the theorem). 


(a) If V= Cepyere ODE and W = (f f_) then (T1) -(T3) 


perro tires 
combined are equivalent to the following single condition: the vectors Te, f) : 


l<i<n, 1<j<m, forma basis forthe space T. 


? ? 


(b) For any nm-dimensional vector space T over K amap T can be defined 
by setting #(v,w) = 2 ae Bi Bi; , where By4> ih<iv<n, isi <m, form a basis, of T. 
According to (a), the pair (7,71) satisfies (T1)-(T3), and all such pairs are obtained 
in this way. 
(c) Given any pair (T', T') witha bilinear map T': VXW- T', wedefinea 
i o 9p => ay i H = } fee) ee 
linear map o:T- T' bysetting o(2 ij &;,) 2 %y T (e,, > 
According to (b) and (c), wehave T'(v,w) = 2 a 8, if (e,, f) = ¢o(2 a 8;,) = 
= a(T(v,w)). Conversely, if o(T(v,w)) = T'(v,w), then 0(8; 5) = a(t(e., f) = Te, i): 
2. Show that one of the conditions (Tl) or (T2) can be omitted from the set 
(T1)-(T 3), and that, if we assume in advance that dim T = nm, then only one of the 


three conditions is needed in the definition of the tensor product. 


3. Prove the relation det(A ® B) = (det es (det B)" for an nXn matrix A 


andan mxm matrix B having complex coefficients, by using reduction to triangular 


form. 


470 


4, Using formula (8) and the tables in Subsection 1 §2, Subsection 2 §5, and 


Subsection 4 §5, verify the decomposition 


(3) (QQ); ,(2) 


&) ae +6° +6 


Gg ' ®@ & 


; 3 
for the tensor square of the two-dimensional representation o! ) of S3 and the 


decomposition 


ee) 


(2) +o 


(5) (5) 


6°’ @o Oy 


~ ob + © 


6 5 4 
for the tensor square of the two-dimensional representation o! ) of the quaternion group 


Qs - 
5. Representations of the direct product of groups. Suppose that we have two 
groups G and H with linear representations (€,V) and (%,W), respectively. 


Then, setting 
(oe) (en) = oie) eh, 


where g*+h_ isan element of the direct product GxH, we make GxH_ act on the 


tensor product V @e W; as usual, 
(S(g) @ Wh) (v @ w) = O(g)v ® Vw 
Verify that the map defined in this way 
@® ':GxH — GL(V® W) 


is a representation of the group GxH_ with character is oy = " Ae Prove the 


1 
( ) e009 ag (respectively, yw) e006 4 y')) be all of the 


following fact. Let @ 
irreducible representations of G (resp. HH). Then the representations eo @ vy) of 


the group GXH are irreducible, and the representations of that form for 1 < i< ie 


and 1 <j< s_ exhaust all of the irreducible representations of GxH. 


471 


. n n : : : : ; 
6. Theforms xy and x +y are invariants of the two-dimensional linear 


dihedral group 


(o) 
=) 
i=) 


Oo 
Q 
bs 


(see Exercise 9 §5). Prove that any other integral invariant of (D> @) has the form of 


a polynomial in xy and ae y 3 


7, Show that the quaternion group, considered in its irreducible two-dimensional 


representation, does not have quadratic or cubic invariants. What can be said about the 


22 4 4 
formsiexe yu and xe oy 7 


Chapter 9. Toward a Theory of Fields, 
Rings and Modules 


A second look at some of the algebraic structures already studied is motivated by the 
following considerations. In the first place, it seems worthwhile to fill out our supply of 
facts on fields and rings, using, whenever necessary, a solid group-theoretic foundation. In 
the second place, the results of Chapter 8 on group representations fit in a natural way into 
the general theory of modules over a ring, and it would be ashame not to go into this at 
least briefly. The fundamental concept of a module is important in its own right, and merits 


much deeper study, but for this we refer the reader to other sources. 


$l. Finite field extensions 


1. Primitive elements and the degree of an extension. If F is a field which 


contains a subfield K, then F _ is called an extension of the field K (see 84 Ch. 4). 


We shall consider the simplest case, when the extension F = K(8) is obtained by adjoining 


a single element @. We say that @ isa primitive element for the extension F of K. 


By its definition, K(6) is the field of fractions of the integral domain K[@]. The element 


473 


@ is said to be transcendental over K (see §2 Ch. 5) if and only if K(@) is isomorphic 
to the field of rational functions in one variable over K. On the other hand, if @ is an 
algebraic element, then K(@)= K[X]/(£(X)) (see (9) in §2 Ch. 5 and the corollary of 
Theorem 5 §2 Ch. 5). Here f(X) is an irreducible polynomial of degree n > 0 having 
8 asaroot. Conversely, if f « K[X] is an irreducible polynomial, then in a canonical 
way (see §3 Ch. 6) we constructa field F inwhich f has atleast one root @. 

We claim that in the algebraic case the field F = K(@) = K[X]/(f£(X)) canbe 


identified with the set of elements of the form 


Cy cu uaiec a ee 4 BG I. n=degf . 


This is clear for elements of K[6@] (if g(X)eK[X], simply divide g(X) by f(X) 

to obtain a remainder r(X) of degree <n; then g(@) = r(@)). Wecmdividein K[@] 
as follows: if g(X) = ag + a,x tar ese x , then, since f is irreducible, we 
have g.c.d.(f,g) = 1, and there exist polynomials u(X) and .v(X) of degrees <n 


such that fu+gv= 1; hence g(6)v(6)=1, and 1/g(8) = v(@). Thenumber n is 


the dimension of the vector space 


over the field K with basis elements 1,6,..., 6 : 

If F=oK_ is an arbitrary field extension (not necessarily algebraic), we can still 
consider F asavector space over K. We let [F:K] denote the dimension dim, F 
(which may be infinite), and we call it the degree of the extension F over K. if 
F = K(6), then [F:K] is also called the degree of the primitive element @. Clearly, 
if @e¢F istranscendental, then the set 1, 6, 9 ,--- is linearly independent over K, 


and so [K(9):K] = 2. Onthe other hand, from what was said above we have the 


following fact. 


THEOREM 1. Let F bean extension of the field K. Anelement 6¢€F is 


algebraic over K if and only if [K(6):K]<«. Inaddition, if @ is algebraic, then 


474 


K(@) = K[6]. ol 


If L, F, and K are fields with L™OF2OK, we call them a (two-step) tower 
of extensions. We can then speak of the following three vector spaces: L/K (L asa 
vector space over K), L/F (L asavector space over F), and F/K (F asa 
vector space over kK). The dimensions of these vector spaces are connected by a relation 


which is reminiscent of the analogous formula for the index of one group in another. 


THEOREM 2. Ina tower of extensions L™F2OK _ the degree [L:K] is finite 


if and only if both [L:F] and [F:K] are finite. In that case the following relation 


holds: 


De el) & (eI (De six] 


Proof. First suppose that [L:F] and [F:K] are finite, Choosea K-basis 
ane for F over K andan F-basis e.,...,e for L over F. Then 
m 1 n 


any element xe¢L canbe written inthe form x = De a. | with ms ¢ F. Then, in turn, 


j 
a, = m Bi f with B, ¢ K. Consequently, x = > By; f, a , and we see that the mn 
i J 


elements f S| span L over K. Now suppose that there is a linear dependence 


relation ain, = © i c 
a » Bi; i a or some Bi, Gik, “Misa 
2} 


0 = oy Ay £27 >A, ‘6 => x ee i B= 0, Vid , 
where we have used the linear independence of the ci over F andthe linear indepen- 
dence of the f over K. Thus, the mn _ elements f. I form a basis of the vector 
space L over K, andwehave [L:K])= mn =[(L:F][F:K]. 

Conversely, suppose that [L:K] is finite. Then [F:K] is finite, since F is 
a subspace of L. Furthermore, if lave En at isa K-basis for L, then any 


xeéL isa linear combination of the a; with coefficients in K , and all the more with 


475 


coefficients in KF. The number of linearly independent elements of {a, Addo 9 & } over 
ie 


F may even be less than r. We thereby see that [L:F] < @. Oo 


COROLLARY. Let F bean extension of the field K, andlet A_ be the set of 
all elements of F which are algebraic over K. Then A isa subfield of F which 


contains K, 


Proof. Every element te K_ isa root of a linear polynomial X-te K[X]; 
hence KC A. Next, supposethat u,veA. Then, by Theorem 1, we have 


{K(u):K]<e. Since v isalgebraicover K, it is also algebraic over K(u), ie., 


3 

[K(u,v): K(u)] = [K(u)(v):K(u)] < ©. According to Theorem 2, we have 

{K(u,v):K] = [K(u,v):K@][K(@):K] <e. Since u-v and uv arein K(u,v), 
€ 

it follows, again by Theorem1, that u-v and uv arein A, ije., A is a subring 


of F. A_ isa field because, if ue A is non-zero, then PRG): K] = 


{K{u):K]<e. 0 


An extension F >K _ is saidto be algebraic over K _ if all of the elements in F 
are algebraic over K. Every element @ of an algebraic extension is a root of some non- 
zero monic (i.e., leading coefficient 1) polynomial fe K[X], which depends on a. If 
f(a)= 0 and g(a) #0 forallnon-zero ge K[X] with degg < degf, then we say 
that f = i is the minimal polynomial of @, The minimal polynomial is uniquely 
determined, it is irreducible, and its degree is the same as the degree of the element @. 
(Often a polynomial obtained from the minimal polynomial by multiplication by a constant is 
also called a minimal polynomial.) The various roots of the polynomial fy are called the 
conjugates of a. The justification for this terminology is given by Theorem 3 below. If 
char K = 0, then the number of distinct roots coincides with deg fy (see 81 Ch, 6), but 
otherwise this is not always the case (see Exercises 4 and 5 below). 


In agreement with the above results, we call an extension F > K a finite algebraic 


476 


extension if the degree [F:K] is finite, iie., if F is obtainedfrom K _ by adjoining 

finitely many algebraic elements a) O00 5 oe . Note that any extension F which is 

obtained from K_ by adjoining finitely many algebraic elements must be of finite degree, 

since an element = which is algebraic over K_ is all the more algebraic over 

K(@,, oo6-1 a.) ; hence [K(a,, a0 F a): K(@,, B60 a] <«, and, by Theorem 2, 
m 


[F:K] = [K(a,,... : aki S ue [K(a,,... ; OQ): KG, soe ; w)) <mo , 


In many cases (always when char K =0; see Exercise 13 below), a finite field 
extension can always be obtdined by adjoining a single primitive element. In the cases 


we consider, we will always be able to show directly the existence of 4 primitive element. 


Example. The field F = a2, 3) » aS avector space over Q, is four- 
dimensional: F = (1, /2, /3, V8 , in other words, every element @e F canbe 
written as a linear combination @ = a + b¥2+cV¥3 + dV6 with Bl, 1, Cy a 


rational. 


On the other hand, we alsohave F = (1,86, 97 Oe » where @ = oe V3. In fact, 


9 83 11 
we have ee Sey ; eos 59-508, M602 -24507, The primitive 


element @ has minimal polynomial fy (X) = se - 10 x + 1. with roots 


ea 60) So os) Se oe ae ee 


Notice that in this case F is already the splitting field for the polynomial fp (X) : 


we have 


UCM a ey = ue hens 


In Galois theory such a field is called normal. The diagram of subfields of F: 


FE 


Q(72) 0(/3) 


477 


resembles the diagram of subgroups of the Klein four-group Mi » and this is no accident. 
if we consider any automorphism @:F—- F (see Subsection5 §4 Ch. 4), then the 
relations @(x+y) = @(x)+ @(y), (xy) = (x) Sy), Vx, ye F, imply that 6 

is completely determined by its action on the primitive element 98. Since @(a)=a, 


YaeQ, we have 
(ey - 106 (6) +1 = ae 06 sien (Oy & 


Hence, (6) is one of the roots gd) , 1=1,2,3,4, and we conclude that the group 
Aut(F/Q) of all automorphisms, which is also called the Galois group G (F/Q), has 
order 4 =[F:Q]. There are only two groups of order 4, upto isomorphism: the cyclic 


group Z and “zx a =vV 


9 4° Direct computations show that Aut(F/Q) = Mh 2 


4? 
The easiest way to see this is to consider the representation of Aut(F/Q) by 

permutations of the set Q = {1, 2,3, 4} which indexes the roots gi) . If, for example, 

3 (8) = a2 ten aft) gf?) - -1 & of?) o(0?) = -1 = 30) = 6) ana 

60”) ae 8 (6) =o gl) = a4) , ie., © (12)(34) = 0. We similarly obtain the 

automorphismns (13)(24) = 7 and (14)(23)= or. 

It remains to observe that the cyclic subgroup (c) leaves fixed every element of 
the intermediate field az , and (0) isthe group G(F/Q (/2)) of all auto- 
morphisms of the field F relative to the subfield Q(/2) . Similarly, the "fixed fields” 
for the subgroups (7) and (oF) are Q(/3) and o/6) , respectively, and the 
Galois groups G(F/Q(V'3)) and G(F/Q (1/6) are (7) and (or). Inthis special 
case we have verified that there is a one-to-one correspondence between subfields of a 


normal field F and subgroups of its automorphism group. 


2. Isomorphism of splitting fields. In §3 Ch. 6, where we defined and constructed 
the splitting field F over K for a monic polynomial 
f{(X) = a +a, ee i us K[X], we noted that there are various choices which 


are made in the construction. But it turns out that all splitting fields over K of a given 


478 


polynomial f are isomorphic. In order to make a more precise statement, we consider a 


somewhat more genera} situation. 
According to Theorem 3 §2 Ch. 5, any isomorphism o betweena field K and 


another field K extends uniquely to an isomorphism from K[{X] onto K[X], so that 


= ~ n-l 
f£(X) = Moe. x : pe ae eae HOO) = Oyf = x" + (@,)X ores + fa) 0 


THEOREM 3. Suppose that ©: K > K is an isomorphism of fields; f « K[X] is 
a_ monic polynomial of degree n> 0, f= o,f is its image under the isomorphism 
Oy eee and F are Splitting fields of the polynomials f and rz over K_ and ik : 


respectively. Then @ can be extended to an isomorphism @:F — FE in k< (Pe ik | 


ways, where k=[F:K] if all of the roots of the polynomial f(x) are distinct. 


Proof. Step. We first consider the case of arbitrary extensions L > K 5 
LOR. Let @e«L bean algebraic element with minimal polynomial g = By € RIL AE I] 
We claim that the isomorphism @: K = K extends toa monomorphism p: K(6) — L 
precisely when % has a root in L , and the number of extensions is equal to the number 
of distinct roots of e in L 0 

In fact, it follows from the existence of p that the element (8) must bea root 
of g: g(6)=0 'g(p(@)) = p(g(8)) = 0. Conversely, if Fw) = 0 , then 
her’ = p(X) KX], where @- KIX) = L isthe homomorphism defined by taking 
u(X) to 2 (w) - As inthe case of groups, y induces a homomorphism 
b: KEX)/g(X) KIX] = LT (given by (u(x) + £(X) KIX) & us) demise not 
completely clear, the reader should refer to the results below). Note that, since g(X) is 
irreducible, the quotient ring K(X]/g(X) K[X] isa field, and hence v is a mono- 
morphism. In exactly the same way we define the isomorphism of fields 


7: KEX]/g(X) K[X] = K@) (given by u(X) + g(X) K[X] } u(@)). The composite map 


p=yoao is a monomorphism from K(@) to L (since p(u(@)) = U(w)) . Since 


K(8) is generated by @ over K, it follows that p is the only extension of ¢p which 


479 


takes @ to q. But this means that the number of distinct monomorphisms p with 


restriction pli = is equal to the number of distinct roots of g(x) in L. 


Step IL ‘The splitting field was constructed by successively adjoining roots of 
irreducible polynomials. We now use induction on the degree LF:K]. 
if [F:K]= 1, then the polynomial f splits into linear factors already in K[X)]: 
f = = ae . : i 5 = = -¢ -c 
(Ox) = Ox c,) wou (Os c,) In this case, f(X) (©, f) (OO) = (Ox c,) han (OK c) . The 


~ 


roots c 


~ 


yee, C_ ofthe polynomial f arecontainedin K, and, since F is 
1 , n 2 ? 


~ 


generated over K by these roots, we have F = K » sothat ¢ = Py is the only 
extension, 

ié [F:K] > 1, wefactor f(X) over K_ into monic irreducible polynomials, 
where there must be at least one CSTE: with degree m-> 1. Let g(X) be sucha 


factor. Since 
£(X) = gO) H(X) => FX) = @,NW = BOER , 
we have the following factorization of polynomials over the splitting fields F and F : 
HOd) = O62 91) san (OS o ‘ 


TCD = (ESS) ond (CS act 


Since it is irreducible, g(X) is the minimal polynomial of the element o) over K, 
and we have [K(6)):K] = igi, 

Ifthere are 4 distinct elements among the W,,---,W 5 then, by Step I, we 
can find 4 monomorphisms f),--- » Py from the extension K = K(8,) to F 
with Pil = 0- Because of the way splitting fields are constructed, we may consider F 


~ 


as a splitting field over K of the polynomial fe K(X] , and we may consider F as 
a splitting field over p,(K,) of the polynomial f£(X) forany i= 1,2,...,4. By 


Theorem 2, we have the inequality [ee = [F:K]/m <[F:K], so that, by the 


~ 


induction assumption, each of the p, can be extended to an isomorphism 7 j dle) Sv HEF 
, 


and the number of such extensions (the number of indices j) does not exceed Lies LG I ; 


480 


and it is equalto [F: kK] if all of the roots of f in F are distinct. Since 


= j : = it follows that @ ., is an extension 
i Kee al eee eaten i 


of », and p; # Pio= 2 # ®t for i#s. Hence, altogether we obtain 


k< m[F: KJ = [F:K] extensions of the isomorphism ©. This inequality becomes an 


equality if all of the roots of f are distinct. 


Step IH, Finally, suppose that @:F - F isan arbitrary extension of ». As 


x.» Which is a monomorphism from Ky a IF, 
1 


in Step II, the restriction ¢ 


coincides with one of the p. and inthis case © coincides with one of the @, ic i 
? 


COROLLARY 1. Any two splitting fields F and F over K of a polynomial 


fe K[X] are isomorphic. 


in fact, it suffices toset K = K in Theorem 3 andtake @ to be the identity 


map from K_ to itself, Oo 


COROLLARY 2. The group Aut(F/K) of automorphisms of any splitting field F 


over K ofa polynomial fe K[X] is finite and has order <[F:K]. If all of the roots 
CAS Of 2 polynomial ool ew SIS Oingieiy SS Fee OCS 


of f£(X) are distinct, then | Aut(F/K)| =[F:K]. 


This corollary is an immediate consequence of Theorem 3, oO 


Remark. Although the Splitting field F over Q (or over any other number field) 
of a polynomial fe Q[X] canbe considered imbedded in the complex numbers @, and 
so is uniquely determined, Corollary 2 shows that, even in this case, it is worthwhile to 


have Theorem 3 (despite its unpleasant proof), 


3. Finite fields. In addition to a = Z/p%, we have encountered other 


examples of finite fields (see $4 Ch. 4). It is now time to incorporate them into a general 


481 


theory. 

The first obvious remark concems an arbitrary finite extension K > F ofa finite 
field F: if lF| Seed seb Sue tien Ik] = ee: To see this, choose a basis 
for the vector space K over F. Then K _ can be identified with the space Bat 
rows (a, soon 5 a) of length n. Since all of the coordinates take on any of the q 
possible values in F independently of one another, we have IK| = lE"| =q 
claimed. 

Our second remark is that any finite field F has finite characteristic p (p isa 
prime), and lF| isa power of p. Infact, since F is finite, the prime field PCF 
must be isomorphic to one of the fields a = Z/p@. According to the first remark, the 
finite extension F > P_ has cardinality lF| = p (where m isthe degree of F 


over P). id 


THEOREM 4. For every finite field F and every positive integer n there exists 
one and (up to isomorphism) only one extension K DF _ of degree {K:F]l=n. 
Proof. (a) Uniqueness. Let KOF be an extension of degree n. We know that 
m F n : 
[F | =q2q=p , where p isa prime, and [K| =q . Consequently, the multi- 


plicative group K* = K\{0} has order q- 1, and, by Lagrange’s theorem, the order 


rae n 1 , 
of any element in this multiplicative group divides q -1: tt =1l, yt#0. This 


means that all of the elements of K (including t = 0) are distinct roots of the poly- 


n 
nomial xt -X, and we have the factorization 


ey, = tye 
tek’ 


It is impossible to have such a factorization into linear terms over any proper subfield of 
K with fewer than qr elements; hence, K_ is the splitting field of the polynomial 


n 
x? =X. By Corollary 1 of Theorem 3, we have the required uniqueness of K. 


482 


(b) Existence. The argument in part (a) suggests a possible method for 


constructing K. Wetake K_ tobe the splitting field over P= ae of the polynomial 

e n cea 
x? -X. Since q=p', wehave q+l1=0 in K. Hence f'(X)=q +1 x4 afl = 
= -1, and by the well-known criterion (Theorem 4 §1 Ch, 6) » {£(X) does not have 


multiple roots. This means that the subset K, © K of roots of f£(X) has cardinality 


IK,| = 4". We claim that K,=K. 


Since Keo K and char K = p, it follows by Exercise 8 84 Ch, 4 that 


s s 


p p 


s 
(x+y)? =x for any x,y € K, eG se 0 yelper ny atcicilan, 


n n 


n 
x,y Kp => (xy)? = xt gyl 2 xay => xaye K 5 


In addition, 


n iim 
le Kg; (xy)? = x yt Sea ay cla: 


lig: eel “1 
Of4xeK, => (« )) =x =— x ¢€-K : 


Thus, K, is a subfield of K which contains F (since all elements of F are clearly 
roots of f£(X)) and all of the roots of f(X). Because of the definition of the splitting field 


we must have K, = K. Wehave [K:F]=n, because Abe Ik] = IK, = Ga 0 


3 


COROLLARY. For every prime p and every positive integer n, there exists 
one and (up to isomorphism) only one field with ps elements, 


This is merely the special case of Theorem 4 when lF| =p. o 


As we noted in §4 Ch. 4, it is customary to let a5 (or sometimes, in honor of 
4 n a 7 
Galois, GF(p )) denote the finite field with q= ps elements. We now prove some facts 


about finite fields. 


483 


THEOREM 5. (i) The multiplicative group I of the finite field F isa 
Leap TC abivcaeLouy Se ee q — 


cyclic group of order q-l. 


(ii) The group ANNE) of automorphisms of the finite field Ba with q = p- 


elements is cyclic of order n, where 


Aut(F_) = (ele(t) = t?, vte FE). 
q q 
(iii) if F d is a subfield of F am then d|n . Conversely, to every divisor 
p p 


d of n there corresponds precisely one subfield {te F ¥ @%(t) Sb = iF a’ The 
p p 


automorphisms of F A which leave fixed all of the elements of this subfield F 4 form 
p p 


d 
a group Aut(F oe @ = (@ ). Thus, there is a one-to-one correspondence between the 
+ 
p p 


subfields of Wy and the subgroups of its automorphism group (the correspondence of 
Galois theory). 
Ce) ls pb" and EO = (@), then @ isa primitive element of the field 


a whose minimal polynomial h(X) over ED has degree n. 1 is the splitting 


field of h(X) over ae 


(v) For any natural number m_ there exists at least one irreducible polynomial 


of degree m over F 


Proof. (i) We shall prove a more general fact, Let F bean arbitrary field, and 
let <A bea finite subgroup of the multiplicative group F* , Wecan apply the results of 
85 Ch. 7 tothe abelian group A. In particular, we know that A_ is cyclic if and only if 
|A | coincides with the exponent m of A, i.e., the least natural number such that 

m 


a mil, WeeAs IW iss |A | , then the polynomial x -1 would have more than m 


roots in F, and this is impossible. Hence, the group A_ is cyclic. 


(ii) We shall regard eA as a finite extension of degree n of its prime field 


484 


F = Z_. Since a is the splitting field of the polynomial xt. x , all of whose roots 
are distinct, it follows by Corollary 2 of Theorem 3 that [Aut (F ,)| =n. Because of 


the relations (x + y)P = xP 4 ye ; (xy)P = x? y? , and Pay » Which we noted during 


the proof of Theorem 4, we see thatthe map @:t 6 Po is an automorphism of the field 


s 
i (the finiteness of a is essential here, for surjectivity). If e°: tet is the 


s 
identity automorphism, then tPo-t=0 forall t € an » Which means that s >n. But 


we do obtain the identity automorphism when s = n; hence l<e) =n, and 


(b) = etre . 


(iii) According to our first remark concerning finite fields at the beginning of the 


r 
subsection, we have pu = (p°) ,» where r _ is the degree of the extension IF h- lig a° 
p p 
d 
Hence n=dr. Conversely, for any d|n we consider the subset F={te F : tP = i 
p 
ie 


, d d 
Since n=dr 3 peer =(p) -1=(p -1)s for some integer sg , it follows that 


n d d 
Ny OS a ao 
n d 
5 en ee Coen 


n 
Because IF a is the splitting field of the polynomial xP -X, precisely pd elements in 


p 
d 


F n are roots of the polynomial yee ae . This is our set F , which we can now identify 
p 


with F a: This argument also shows uniqueness of the subfield with oe elements. 
p 


We note that, by construction, 


m d 
F,=itte F Io @ = t} 
p p 


is the set of all elements which remain fixed under the action of ees . Since the group 


485 


Aut (F ay = (6) is cyclic, it is immediately clear that any automorphism 3° not in 
9) 


d 
{@ ) doesnot acton F q 38 the identity (simply apply a to a generator of the group 


p 


F*)- But this means that the group Aut(F n/F @ of relative automorphisms coincides 


Pp Pp p 


2 d 
with (@ ). The reference to Galois theory at the end of (iii) has the same sense as in 


the example in Subsection 1. 


(iv) It is obvious that i = Bae > a= >" . Let h(X) = ape 


fotal 
BP ay x + oeee + ae be the minimal polynomial of the primitive element @. Since the 


elements of the prime field a are fixed under all automorphisms, and ai€ ms elt 
2 n-1 


follows that the roots of h(X) are 6, QP a 7080 ¢ ep, ‘They are all contained in our 


no oi 
field, and F(0,..., gP F (9) = F isthe splitting field of h(x) over F, 


(v) Using Theorem 4, we construct the extension K > ae of degree m. 
According to (i), K* is acyclic group. If K* = €@) and h(X) is the minimal poly- 
nomial of the primitive element 6 over the field ae , then K= ie) and 


q 


irreducible (over ay , so it satisfies the requirement of (v). (=| 


deg h(X) = WEE FO! = [K:F_]= m. By definition, the minimal polynomial is 


After some simple number-theoretic preliminaries, we shall obtain an exact formula 


for the number of irreducible polynomials of degree m over ra 


4, The Mobius inversion formula and its applications. The function gy on the 


positive integers which is defined by the formulas 


iL ie i = ly 
p(n) = < (- ine et oe Pyeee Py is a product of k distinct primes , 
0, if n is divisible by a square greater than 1, 


486 


is called the Mobius function. The function yp is clearly multiplicative in the sense that, if 


n and m are relatively prime, then p(nm) = p(n) p(m). It is also clear that, if 


m 

ee : wae Ae z , then 2 p(d) = py u(d) , where Ny = P,--- P, is the greatest 
din n 
0 


square-free divisor of un. Note that the number of divisors d = Pyocee Py of Ny with 
ii Ss 


fixed s_ is equal to or Thus, for n> 1 we have: 


r 
u(d) = 2 NE) OS CGH = aay 
djn djnp s=0 


(the summation on the left is over all divisors d > 1 ofthe integer n). We finally 


tt 
S 


obtain: 


EON (1) 
Bl 0, if Sealy & 


The following modification is also useful: 


Dee Ae es = (2) 


n, d|n|m 0, if djm and d<m 


(the summation is over n dividing m_ and divisible by d). If we set m=dt and 
n=d£ andmake 4 run through the divisors of t » Wwe easily derive (2) from (1). 

It would be possible to use formula (1) (or (2)) to define the Mobius function by 
induction. The value of this formula for us is contained in the following fact. Let f and 
g he any two functions from N to M (where M=Z,R, F[X], etc.) which are 
connected by the relation 


c= st). (3) 
d|n 


Then 


487 


gin) = DP wat). (4) 
djn 


To prove this, we multiply both sides of (3) by m(m/n) and sumover n 


dividing m, using (2). We obtain 


yD w=) f(a) = 2, BD 2 g(d) = 2, g(¢) * BCD = e(m) 
olm nim d]n d/m n, d/n|m 


A simple change of notation gives us (4), which is called the Mobius inversion formula. It 


is possible in a similar way to derive (3) from (4). | 


There is also a multiplicative analog of the Mobius inversion formula. If 


fi@— fF sq, 


d{n 
then 
BG) 
g(n) = [1 f(d) : (5) 
d|n 


This is proved by making the same type of formal computations: 


a m, 2 2 mi) 


fl f@ 9° = T TN ea) " = Ff mM  g(d) TM g(d) 


= g(m), 
n{m nlm d/n djm n,d|n{m d|m 


and then making a change of notation. 


We shall give three examples of how the Mobius inversion formula is applied. 


Example 1. Euler's function. By definition, ¢(n) is the number of integers 
n-1 which areprimeto n, or, equivalently, (n) = luz.) | is the order 


799° 9 


of the group of invertible elements in the ring Ze = Z/nZ. From Exercise 5 of §1 


n = p> p(d). (6) 
djn 


Ch. 8 we know the relation 


488 


Using (4), we immediately obtain 


p(n) = 2 w(n/d)d = a MG) afal = > oC 
djn djn d/n 


m ee 
If DS Peer 3 then 


was WEES ee cy eee ee 
> d es & PP, P)Po+++ P Py P P 


ite 2 ig 


Thus, 


1 1 i 
6B) a GD tre) are ee) 
Py we, Be 


a formula which we already gave in Exercise 3 of §8 Ch. 1 and which immediately implies 


the multiplicativity of the Euler function. 


Example 2. Cyclotomic polynomials, The splitting field ri of wees 1 over Q 
is called the cyclotomic field of n-th roots of unity. Since the n-th rootsof 1 form 
a cyclic group of order n, it follows that this cyclotomic field has the form oe = Q(%), 
where € is any primitive n-th root (¢ € @). We would like to determine [Tr :Q] 
and find the minimal polynomial of € over Q. 

Let Po denote the set of primitive n-th roots of 1, which has cardinality 
Ip] =(n). The subgroups of the cyclic group of order n are in one-to-one correspon- 
dence with the divisors d of n (Theorem 6 of 83 Ch. 4), and each root e falls in 
exactly one of the sets Py - Hence, we have the following partition into disjoint sets: 


eee onde balers U Py (7) 
d|n 


(note that if we take the cardinality of both sides, we again obtain (6)). The cyclotomic 


: di : ; 
polynomial corresponding to iS is the polynomial 


@ (X)= MW (X-e) 
n 
fer 


489 


of degree o(n). Corresponding to (7) we have the factorization: 
fi a 
n 


x" aie M x= 2) = <li in eat es) 


i= 
d|n ceP, djn 


If we apply the multiplicative version of the Mobius inversion formula to (8) , we obtain an 


explicit formula for e: 
d 
® (xX) = W(x -) : (9) 
din 
For the first few values of n wehave 
2 
@(O=xX-1, @ (x)= X+1, @,(X)= X +X+1, 
a l 2 gt 4 
DX) 5 Oo (X) = x & ear dl 4 $.(X) = X +1, 


e 


a, =x 4x41, $= xX - kK 4x -x41, 


4 
Coys OC = x? + 1 
Note that 


@(X) e Z{X] and ® (0) = tere isi 4 (10) 


To prove (10), we can either use (9) or else apply induction, The proof by induction is as 


follows. We have verified (10) for smali n. By the induction assumption, 


g(x) = (x) 
d|[n,d#n 


is a monic polynomial with integral coefficients and constant term -1. Using the division 
algorithm (Theorem 5 of §2 Ch. 5), we obtain uniquely determined polynomials 

q,zr ¢€ Z[X]_ such that ee ee q(X) g(X) + r(X), deg r(X) < deg g(X). But 

ee @ (x) 8(X) in @[X], and so @ (x) = q(X) ¢« Z[X], and ® (X) is monic 
with constantterm 1. 


We can say more: ®  (X) is irreducible over @Q, and so r = Q(¢) isan 


extension of degree (n) with minimal polynomial (X) for &. We shall not prove 


490 


this, but recall that at the end of §3 Ch. 5 we established the irreducibility of 


ee = (xP BW /OX = IN) s xPol + yee +-+-+ 1, where p_ is any prime number. 
It should be noted that the cyclotomic fields, which played a key role in the develop- 
ment of algebraic number theory, are still the subject of active research by many 


mathematicians, 


Example 3. Irreducible polynomials over Ee ne let Y, (q) be the total number of 
monic irreducible polynomials of degree d over oye > Ge p- » andlet f(X) be one 


such polynomial. Its splitting field over oy is isomorphic both to the quotient ring 


d 
ie des J/ £(X) Lee! and to the splitting field of the polynomial x? =X (see the 


d 
corollary to Theorem 4). Since the polynomials xt X and f{(X) have a common 


d 
root 6, while f(X) is irreducible, it follows that x? -X is divisible DY GOO) 4 


d m m 
Since xX? -X divides the polynomial x for any m= rd, and since xo x 


does not have multiple roots, we may conclude that each of the monic irreducible 


polynomials 
f X f Go0 
ghee athe OE ee es 
m 
of degree djm occurs exactly once in the factorization of xX? -xX over F 
7 ¥ (a) 
2 SO SAM Nl f, p™ : (11) 


Now taking the degrees of the polynomials in (11) » we obtain the relation 


m 
q = 2 dw 5 (q) ’ 
d/m 


from which we find an expression for VM) by applying the Mobius inversion formula 


(4): 


491 


1 m, d 
v @ = — de NEE wee (12) 
djm 
For example, let q=2. Then 


We pe? 
wT @) = 2 —2)=1, 


re 
Os 
~~ 
bo 
~— 
It 
| 
“~ 
bo 
' 
bo 
~ 
Ul 
bo 
~ 


1a 
ant 
~ 
bo 
~— 
I 
| 
“~ 
bo 
' 
bo 
~ 
I 


t 4 
¥,2) = 7-27) = 3, 
GO) eee Co oo eee eee 

6) = Gf ey 


(compare with Exercise 10 §1 Ch. 6). The formula (12) shows that a randomly chosen 
monic polynomial of degree m over a has a probability of about 1/m_ of being 
irreducible, But there are no satisfactory criteria for determining in a concrete case 

5 
whether or not a given polynomial is irreducible. For example, what can be said about 
irreducibility of the trinomial on xk +1 over F,? Questions of this sort constantly 
arise in algebraic coding theory (see Problem 3 of §2 Ch. 1) and in the construction of 
pseudo-random sequences. 

Example 4. Constructions by ruler and compass. Let K CCS bea 
constructive number field (see p. 219) which is a finite extension of @. We first 
suppose that K is real, i.e., its elements are real numbers. In particular, K 
has a primitive element © (see Exercise 13) which is real and can be constructed 
(as the length of a line segment) in finitely many steps by ruler and compass. This 
means that © is an element of a field QO, 0,5---,0.); where the degree 
[Q(O,,---,0,) : Qo S) }] is at most 2 for each k. This is because O. is 


ae etal 


a solution to two equations with coefficients in Q(0,,---»9,_4) either for two 


jines, for a line and a circle, or for two circles. Now the results of subsection 1 


concerning the degrees of algebraic extensions in towers show that [Q(0,,---,0,):0] 


= ae, where m<r. Since Q(0) & M(0),.- 9), it follows from Theorem 2 that 


492 


the degree [Q(©):Q] is a power of two. 

Turning to the case of K not necessarily real, we again write it in the 
form K = 0(@). Now the primitive element 9 = at+ib isa complex number whose 
real components a and b are constructive. Namely, if f£(X) is the minimal 
polynomial (with rational coefficients) for ©, then f£(0) = 0 and £(0) = (0), 
where 6 = a - ib. Then clearly a(0,0) is a finite algebraic extension of Q. 
Its elements a= (0+ 6)/2 and ib = (Q - 6)/2 are algebraic over Q, and so is 
b= ib/i (see the Corollary to Theorem 2), since of course eile 0. 

Thus, Q(a,b) is a finite real algebraic extension of Q with a and b 
constructive. According to the above, we have [Q(a,b):Q] = 2", Because al 
is irreducible over Q(a,b) CZ R, it follows that ([Q(a,b)fi):Q(a,b)] = 2, and so 


(Q@(a,b)(i):Q] = pe Since K = Q(©) € Q(a,b,i), the degree [0(©):Q] must 


AER Co. 


We have proved the following important fact. 
If a constructive number field K is a finite algebraic extension of Q, 
n A 2 
then [K:Q] = 2 for some nonnegative integer n. 


This enables one to answer various questions that were raised by mathematicians 


in ancient times. 


a) Is it possible to construct (using ruler and compass) the edge of a cube 
having volume 2 (the Indian problem of doubling the cube)? Here it is assumed that 


we are given a cube of unit volume. The polynomial x? 


-~2, a root of which is the 
length of the desired side, is irreducible over Q; hence [0 (272) :0] = Be ae 


Therefore, this question has a negative answer. 


b) Is it possible to divide any angle into three equal parts using ruler and 
compass (the problem of trisecting an angle)? The answer is nepative even for the 


. : ° 
specific angle 69 . Namely, to construct o = F0e would mean we could construct 


493 


pany 


cos ¢ and 2cos@¢. But by de Moivre's theorem, == cos 60° = cos 30 = 4 eet = 


i) 


Seo COSM(RESOM that O = 2 cos $¢ is a root of the polynomial f(X) = x = Se = is 
Since 1 and -1 are not roots of £(X), it follows that the polynomial £f(X) is 


irreducible over Q (see Exercise 8 of §4 Ch. 6), and so [Q(0):Q] = 3 # page 


c) Similar arguments show that for various values of none cannot construct 


a regular n-gon by ruler and compass. For example, if n=7 it is not hard to see 


o 
ia) , 
that the number © = 2 cos — is a root of the polynomial oe Soro 


which is irreducible over Q. 
The great Gauss, at the very beginning of his mathematical career, found 
necessary and sufficient conditions on n in order that a regular n-gon can be 


5 

constructed by ruler and compass. In particular, for n a prime he found that it 
k 

must be a Fermat prime n= 2° +1. The complete solution of this problem is 


connected with the study of the Galois group of cyclotomic fields (see Example 2). 


EXERCISES 


1, Show that an extension F >K_ of prime degree does not have any subfields 


besides F and K. 


2. Find a primitive element for the extension a(vp,vq) » where p and q 


are prime numbers, 
3. Findthe degree over Q of the splitting field of the polynomial xP. 9 4 


4, Show that over a field K of characteristic p>O there are only two 
possibilities for the polynomial xP -a: either it is irreducible or it is the p-th power 


of a linear polynomial, 


494 


5, Let @ (Y)_ be the field of rational functions over - , Which has 
p 
characteristic p. Show that xP-y is an irreducible polynomial over a all of 


whose roots coincide. 


6. Prove that for any d In » d<n, we have the relation 


Soe = (x8 = 1) (Oh, (X), where h, € Z{X]. 


7, Let q beapositive integer > 1. According to (10), @ (a) € Z. Show 


that @ (al -lpa n=l. 


8 7 5 4 3 
8. Verify that the cyclotomic polynomial P15 (OG) Se Oe ae SOC sea Sap il, 


: 4 3 
considered over F, , is the product of the two irreducible polynomials X +X +1 and 
x +X+1. Using this fact, prove that $15 (%) is irreducible over Q (compare with 


Exercise 11 of §1 Ch, 6). 
9, Starting with the chain of natural inclusions 
Zl 3! 
GF(p) © Grip )C GPip J ... 


ol t 
introduce the so-called limiting field a = GF(p ), by setting @e« oF eae GF(p") 
for n_ sufficiently large. Using the basic properties of finite fields, prove that a is 
an algebraically closed field. Along with @, which has characteristic 0, these fields 


7 


a provide examples of algebraically closed fields of any characteristic. 


10, Let q= p - show that if p = 2. all of the elements of 25 are squares, 
while if p > 2 the squares lee form a subgroup of index 2 in » and 


EY ame Ae 


ll. (M, Aschbacher), Prove the following fact for p> 5. Let EF bea finite 
once Cl SR ern si 
field with an odd number q = pb" of elements. If q #3 or 5, then the "circle" 


+ y = 1 contains a point with coordinates x,y « F*. 


12, Is every primitive element of the field a a generator of the multiplicative 


roup F*? 
e q 


495 


13. (Theorem on the primitive element). Let F = K(©),055..+,0) be a finite 
algebraic extension of a field K of characteristic zero. Show that F = K() for 
some element © algebraic over kK. (Hint. Use induction on r_ to reduce to the 
case F = K(a,8), where a and £8 are algebraic over K with distinct minimal 
polynomials f(X) and g(X). Let L be the splitting field of the polynomial 
£(X) g(X), so that £(X) = (K-a,)(X~a,)ers(K-a), 2 (X) = (K- Bj) (K-B)e**(X-B ), 
where Os> SB, € ih, Wh Srey, B= 8B. Irreducibility of f and ¢, together with the 
condition char K=0, guarantees that the elements Os B, are pair-wise distinct 
(see subsection 4 §1 Ch. 6), and we can consider the elements ey ihe, 
where i, j}#1. Take any rational number c #0 different from all of these ratios 
(again using the condition charK=0!), and set O=f + ca. 

Clearly K(O)@K(,8)=F. The polynomials f(X) and h(X) = g(@-cX) e 
K(O)[X] have common root a. = QO; is also a common root for some i>1, then 
Qs h(a,) = g(O-ca,), so that Oe I for some j>1. But this means that 
either c(a-a,)= 0 or else c= (B,-8)/(a-a,), both of which possibilities are 
ruled out by our choice of c. Thus, X-a is the greatest common divisor of 
the polynomials f, he L[X]. But actually f, h « K(O)[X], and so (see subsection 3 
§3 Ch. 5) we have g.c.d.(f,h)e¢K(O)[K]. Therefore, X-a €K(@)[K], i.e., ae KO) 


and then 8B = O-ca «€ K(O). This means K(o,8) CE K(O), i.e., KC,8) = KO). 


14. The victure 


C 0 D 


illustrates one way to trisect an angle: © = $/3. The sepments OB and CB _ have 


length 1. But how can one construct the point C piven the point A? 


15. By constructing specific constructive number fields K, show that the 


degree [CS:Q] is infinite. 


496 


§2, Various results about rings 


This section is intended as a small but useful addendum to Chapters 4 and 5. 


1, More examples of unique factorization domains, In $3 Ch. 5 we proved that a 


Euclidean ring has unique factorization. Such rings include the ring @Z and the polynomial 
ring K[X]. Below we give another example of a Euclidean ring, and also an example of a 


unique factorization domain which is not Euclidean. 
Example 1. The ring of Gaussian integers. This is the ring 
21a) ni te ze 


which is contained in the quadratic number field Q(i)— Q, ic + 1= 0. Geometrically, 
this ring can be thought of as the grid (lattice points) in the complex plane with integer 
coordinates. Z[i] is clearly an integral domain. Onthe set Z[i]* of non-zero 
elements of Z[i] wedefinea map 6: Z[i]* ~ NU {0} bysetting 6§(m + in) = 

= jm + in|? = ne + i (in other words, 6(a) =N(a) isthenormof a in Q(i) in 
the sense of Subsection 5 of §1 Ch. 5). We know that 6§(ab) = §&(a) &(b) > 6(a) for all 
a,be Z[i]*, sothat property (E1) in the definition of a Euclidean ring (see Subsection 
3 §3 Ch. 5) is automatically fulfilled. In order to see (E2), we write the fraction ae 
with b #0 inthe form ae =@+i8 with a, 8 e« @ and we take the closest integers 
k and 4 to @ and f, sothat @=k+p and B=4£+ y where |p| <5, 


Wl <5. Then 
a= bl(k+v) + i4@+p)] = bqtr 


where q=k+ife Z[i] and r= b(y+ip). Since r=a-bq, wehave 


re Z{il], and 


I 


2 2 
8(r) = [rl = plo +n <smG+p-=+8e<50) . 


Hence, Z[i] is a Euclidean ring, oO 


497 


The ring of Gaussian integers is convenient for illustrating the methods of algebraic 
number theory in a simple setting. For this reason we shall go into a little more detail on 


the properties of Z[i]. We first make some general remarks. 


1) Anintegral domain R_ all of whose ideals are principal, i.e., have the form 
xR, is called a principal ideal domain. Every Euclidean ring is a principal ideal domain, 
We have already established this in the case of Z and K[X] (see the corollary to 
Theorem 5 §2 Ch, 5), and the proof is completely analogous in the general case: if J is 
an ideal in a Euclidean ring R, then J =aR ifwechoose aejJ sothat 6(a) < 8 (x) 


for allnon-zero xeJ. 


2) Let R_ bean arbitrary Euclidean ring with function 6 (see Subsection 3 §3 
Ch. 5), and let U(R) be its group of invertible elements, Then 


ue U(R) # 6(u) = 6(1) © (ux) = 6(x) forall x e R* 4 (1) 


To see this, using (El) wehave §(x) = §(1°x) > 6(1) forall x € R*, and if 
ue U(R), then 6(1) = 6(u se) > 6(u), sothat 6 (u) = 6(1). Conversely, by 
Remark 1), wehave 6(ux) = 6(x), ¥x e« R* = uxR=xR > x =uxv => uv=al es 
eeue UCR). 

Applied to Z[i], the criterion (1) above means that m+ in « U(Z[i]) # 
e m2 + ap = 1. Thus, U(Z[i]) is the cyclic multiplicative group of order 4 generated 
by i. 

3) Anideal J inaring R_ is called maximalif J # R andifevery ideal T 
containing J coincides with either J or R. Ina Euclideanring R, an element 
pe R_ is prime if and only ifthe ideal pR is maximal. To see this, first let p bea 
prime element, and suppose that pRO TCR, where T isanidealof R. By 
Remark 1), wehave T=aR, andsince peT, that means that p= ab, where one 
of the elements a or b must be invertible (since p is prime) If aeU(R), then 


T=aR=R. If be U(R), then T=aR=abR=pR. Conversely, suppose that 


498 


the ideal pR is maximal, and p=ab, where a 4 U(R). Then aR #R, and 
pRC aR, sothat pR=aR. But then a=pu-=abu, andhence bu=1 and 
b ¢ U(R). This means that p isa prime element, and the proof of Remark 3) is 


complete. 


We now look at what happens to a rational prime p ¢ Z when considered in the 


ring @Z[i]. It may happen that p remainsa prime elementin & (ij, but if this is not 


R 


the caselet p= TI be its (unique by Theorem 4 §3 Ch, 5) factorization into prime 


P. 
k k 


1 


elements of Z[i], where r > 1. According to Remark 2), we have 6 (p,) = il 


Px 
2 tar 
for each k, sothat the equation p = 6(p) = 11 6(p,) impliesthat r= 2, p=p,p., 
ae OD 
6(p,) = 8(p,) =p. I p)=m+in, then p= 6(p,)=m +n = (m+ in)(m-in) = 


=P, = m- in. Thus, ifaprime pe« Z has a non-trivial factorization in Z[i], then 


52 (i eave =) 2 ae , (2) 


where m+in and m-in areprimeelements in Z[i]. 
For example, 2 = (1+i)(1-i) is not a prime element in Z[i]. Also note that 
) 


t =O or 1 (mod4) forany t « Z. Hence, if p is an odd prime which is not a 


prime element in Z[i], the criterion (2) tells us that 


p=m +n = 0, 1 or 2 (mod4) = p isoftheform 4k +1 


We now let p be oftheform 4k+1, andclaimthat p does not remain prime 
in Z[iJ]. Set t = (2k)!. Since clearly t = (DO = (2 (2) oon (22) S 


= (p-1) x (p- 2)... (p- 2k) = ((p + 1)/2) ... (p - 2) (p - 1) (mod p), it follows that 


ee [TOM (een Wy) a, (ee) = (eo yl aed 


’ 
or, by Wilson's theorem (see the end of §1 Ch. 6), we have e + 12 0 (mod p). Now if 
p were a prime element in Z[i], then it would follow by Theorem 1 §3 Ch. 5 together 
with the relation (t+ i) (t - i) = eC +1l=4p forsome £¢€ Z that p must divide 


either t+i or t-i. But comparing imaginary parts ina relation of the form 


499 


t+ i= p(m+in) gives: +1-=pn, ne Z, whichis absurd. Thus, we have proved 
the following fact: 

A prime number pe Z remains primein Z[i] ifandonlyif p=4k-1. 
Every prime of the form p= 4k +1 canbe written in the form m” + n2 , Where 


gil € Fh O 


It is now relatively easy to prove a general number-theoretic theorem about when 


a given integer can be written as the sum of two squares. 


THEOREM 1. Anumber te Z_ canbe represented as the sum of the squares of 
two integers m and n if and only ifevery prime number p=4k-1 inthe prime 


factorization of t occurs with an even exponent. 
’ 


Proof. In addition to the facts we already know, it suffices to show that: 
a 2 pole 
g.c.d.(m,n)=1, m +n =O (modp) = mn # 0 (modp) > m = 1 (mod p), 


2 


n = ne (mod p) = (ae cane = mao ne = ne ae = 


= -1 (modp). Thus, there 
: : 2 4 
exists an integer se Z suchthat s = -1 (modp), s = 1 (modp). Hence, the 


order p-1 ofthe multiplicative group 8 is divisible by 4, and p is of the 


form 4k+1. = 


By Remark 3), the factthat p = 4k-1 is prime in Z[i] is equivalent to 
maximality of the ideal pZ[i], which, in turn, is equivalent to the quotient ring 
“Z[il/p@li] being a field of a elements (in this connection, see the isomorphism 
theorems in Subsection 2 and also Exercise 14 of §4 Ch. 4). This is not surprising, 


Za 
since when p= 4k-1 the polynomial X + 1 is irreducible over z 


Example 2, Polynomial rings over unique factorization domains. We shall now show 


that the polynomial ring 2Z[X 5 Ad and K[X, ADOe xj (where K_ isa field) 


19 °°8 


are unique factorization domains for any n. This important fact follows immediately from 


the following theorem. 


500 


THEOREM 2. If R_ is a unique factorization domain, then so is the polynomial 
ring R(X]. 


Proof. The proof is based on the properties of polynomial rings related to Gauss's 


lemma (see 83 Ch. 5). Namely, we shall need the following two properties: 


(a) Two primitive polynomials f,g ¢ R[X] which are associated in Q(R)[X] 


(Q(R) is the field of fractions of R) areassociated in R{X] (this is an easy exercise 


using Gauss’s lemma). 


(b) A polynomial fe R[X] of positive degree which is irreducible over R is 


also irreducible over Q(R) (the proof given in §3 Ch.5 for R = Z also works inthe 


general case). 


We now proceed to the proof of the theorem. If f¢ R[X] isa polynomial of 
positive degree, we write f = d(f) fy » Where d(f) isthe content of f and fy isa 


primitive polynomial. Using induction on the degree of a primitive polynomial, we obtain a 


if 


decomposition of f, pred 


into a product f, = f,... a of primitive polynomials f 


0 0 i,” 


which are irreducible over R. Suppose that fy = gy on8 g is another such factorization. 
Then, by property (b) above, the f and g; are irreducible over Q(R), and, since 
the ring Q(R)[X] is a unique factorization domain (see the corollary to Theorem 4 83 
Ch. 5), we have s=t and, with a suitable ordering of the f's and gis, f. is 
associated with g, in Q(R)[X]. Consequently, by property (a), they are also 
associated in R[X]. If the content d(f) is not invertible in R » Wefactor itin R: 
d(f) = Press Pls and finally arrive at a factorization of f. This factorization is unique 
(in the usual sense), because we have just seen that the factorization of fy is unique, and 


the same holds for the factorization of d(f) because R_ is a unique factorization 


domain. O 


We have strict inclusions 


501 


Euclidean rincipal ideal unique factorization 
ee rawr cane ere ys et } 
gs omains domains 


(3) 


We have already established the first inclusion (see Remark 1)). There are examples (we 
shall not give them) which show that it is a strict inclusion, To prove the second inclusion, 
we let R_ bea principal ideal domain and consider an increasing sequence of ideals 

(d)) S (d,) c... in R. We immediately see that D = u (d,) is anidealin R. 
Hence, D=(d), de D. By the definition of D, we musthave de (qd for some 


m, and so (qd) = (G| ) =... . Since we have just shown that an increasing sequence 


m+1 
of ideals must stabilize at a finite distance out, it follows that a sequence of non-invertible 


divisors qd, es qos d, pane walan qd, ca must also stabilize; hence, any element in R 
factors into a product of indecomposable elements. We see unique factorization using the 
same type of argument: (a,b) = an +bR =dR = (d) => d= g.c.d. (a,b) =ax+by. 
The rest of the argument is the same as in the proof of Theorem 3 (ii) in §3 Ch. 5. 
The ideal (2,X) in Z[X] andthe ideal (X,Y) in R[X,Y] arenot 
principal ideals (see the example in Subsection 3 §2 Ch. 5). But, by Theorem 2, the 
rings Z[X] and R{X,Y] are unique factorization domains. Thus, the second 


inclusion in (3) is strict. oO 


Principal ideal domains are interesting from a purely algebraic viewpoint, since they 
are characterized by the properties of very natural objects -- the kernels of homo- 
morphisms. On the other hand, Euclidean rings are more useful to work with because they 


have the division algorithm. 


2, Ring theoretic constructions. We already have at our disposal a significant 
arsenal of types of rings and methods for constructing new rings from ones we start with. 
For example, we have matrix rings M, (8) , fields of fractions Q(R), and polynomial 
rings R[X, 600 F x] , Where R isa commutative ring (an integral domain in the case 


of Q(R)). It is worthwhile to discuss, at least briefly, the ring-theoretic analogs of the 


502 


general facts about homomorphisms which were established for groups in Chapter 7. Asa 
rule, the proofs will be no different from the case of groups, and so will be left to the reader 
as exercises. 

To the fundamental theorem on ring homomorphisms (Theorem 2 §4 Ch. 4), we add 


two isomorphism theorems, 


THEOREM 38. Let R bearing, S bea subring, and J beanidealof R. 


Then S+J={x+y | x ec y ci isi subring of R containing J as an ideal, and 


SN J isanidealin S. The map 
Haul Ske Sil, BES . 


gives a ring isomorphism 


ean = SVS 


Proof. The first two assertions are obvious. As for the last one, consider the 


restriction TT) = 7 of the natural epimorphism g7:R —- R/J. The image Im T 


S 
consists of the cosets x+J with xeS, i:e., Im T) = (S + J)/J. The kernel 

Ker 7 of the epimorphism To): S~ (S+J)/J consists of the elements xeS for 

which x+J=J. Thatis, Ker Tae SJ. By the fundamental homomorphism theorem, 
the correspondence TF gos ap Sia) If te TF (x) = x + J. gives an isomorphism 


S/(SN J) = (S+ J)/J. It remains to note that o = 7 : Oo 


We have gone through the details of a proof which is essentially copied from the proof 


of Theorem 2 §3 Ch, 7 in order to emphasize the complete parallelism with group theory. 


THEOREM 4. Let R bea ring, S beasubring, and Jo S beanidealof R. 
Then § = S/J isa subring of R/J, and 7*:S b S isa bijective map from the set 
Q(R,J) of all subrings of R containing J to the set a(R) of all subrings of Roe 


S<Q(R,J), then S$ is anidealin R ifandonly if S$ isanidealin R, and 


503 


R/S = R/S = (R/J)/G/) ; 


Proof. This is an easy exercise, following the proof of Theorem 3 §3 Ch. 7. nN 


COROLLARY. Let R_ bea commutative ring with unit 1. Anideal J is 
maximal in R_ if and only if the quotient ring R/J isa field. o 


The following operations are defined on the set of ideals of a ring R: 


sum: Weeds = EE | Pa I ; 
intersection: J, J, = {x|xe pee Is ; 
é oat 
produce, J, Jy = 2 KX Lt S19, - 


One can also speak of the sum, product, or intersection of a finite number of ideals, and 


we have the following fact. 
PROPOSITION. If R_ isa ring with unit, and the equalities 
[ar = Eee Ie = ienen 4 Wl 6 
hold for ideals J, J, nooo 4 IC , then the following equalities also hold: 


eet ieee eect Ip ese dS 


Proof. Since I,J, eters Ie c J, n Jy Moss fl lL. , it suffices to prove that 


Ua ere if =R. If n= 1, this is true by assumption. If n= 2, we have 
1= 17 = (x+y) +y,) = xt 
Been ee a a = oe 


where XpeXqrX ET, 9, € J, - Hence, le J+J J,; and Bae do: We now use 


an obvious argument by induction on n. a 


504 


Let Ryseeey R be a finite set of rings, and let Bo ee be the 
n 


cartesian product of them as sets. We introduce a ring structure on R_ by defining 


addition and multiplication component by component: 


(x rey XV + VY peers YD) 


ie (KX) + Yyoeees X + YD) ; 


1 


(Xpycees x) . (Vyreee : va = (XV po cee ; xn 


This gives us the so-called (external) direct sum R= R, 6...4 R, of the rings R - 


1 
Each of the components \ Ry is the image of the natural epimorphism 7, : R- R. which 
takes (x),-.-,%,) to x, R,. Furthermore, if J, = ones ,0,x,,0,.+.,0) Ix, E Ri} ? 


then 18 Fi isanidealin R, and R=J, +-+--+J 


LY 1 fa” 


Now suppose that R_ is a ring with ideals Jy. bad g Us » where R= J, ap OR Ap || 


and feo (2, 1)=°, 1<k<n. Then R=J, @-..@J, is the (internal) direct 


sum of its ideals Je . As in group theory, the difference between external and internal 
direct sums is rather pedantic, and there is no need to distinguish between the two in our 


notation. 


3. Number theoretic applications. The universal property of direct sums is the 
following: if S = Ri ooo ee Ro , and R is any ring having homomorphisms 
P: R- R; , then there exists a unique homomorphism = ©, panes o,): R->S_ with 
kernel Kerep = 9; which makes the triangular diagram 


S 
= IR 
9, 1 


ll 


R 


commutative for i=1,...,n. We apply this obvious fact to a ring R having unit 1 


and ideals J gaaog Up and to the direct sum 


505 


S= R/J, el oog Ge) R/J, 
Setting @.: Rew R/J, = Ri » Wwe obtain a homomorphism 
p:x (x+Jy,---,x+J) (4) 


from R to S_ with kernel Kero =J,n---NJi- 


THEOREM 5 (The Chinese Remainder Theorem). Under the above conditions, if 


R_ is a ring with unit and lee for 1<i#fj<n, thenthe map © in (4) 


above is an epimorphism. 


Proof. We must check that, given any elements x XE R, there exists an 


pop 

x é€ R_ such that x tJ, =x+J,; Te 5 5 x-x €J., one eS Gea Wels Ibi 

n = 1, this is obvious; if n= 2, we take elements are), and a, € J, such that 
¢ 


= = - 
at a5 1, andweset x x,a, + Xa) Then 


oo oe = 16S a ae 


129 + Xp4,) ~ *, (4; + a5) = 


| 
ron 
fal 
i) 
1 
val 
e 
— 
p 
— 
mn 
oH 
pan 
~ 


tal 
a 

tal 
| 


= (x) a, + x, 4,) - x, (a, + ay) S62 


We now use induction on n. Suppose that we have already found an element y such that 
y-xX €e Jy, ile Slee Since by assumption le el ee Ro ibe i< ino. 
it follows by the proposition in Subsection 2 that J, lecs il Wey + ie = R. Now apply 


the case n = 2, which was already proved, using the two ideals Jy Meosolhl le and le 


all 
andthe two elements y and xo: We find xe R_ with ey 2 lie and 


x-xX € Ji: But then x-y ej,» 1<i<n-1, and, by ourchoiceof y, 
Rg S Oe Wee) 2 es 


Thus, the element x_ satisfies all of the requirements. iS 


In Theorem 5 and the arguments preceding it, the ring R was not assumed to be 


commutative. Now supposethat R_ is an integral domain and Ayreres a are pair-wise 


506 


relatively prime elements, i.e., a/R oF ey R=R if i#j. (Ina unique factorization 
domain, this agrees with the definition that elements are relatively prime if their 
factorizations contain disjoint sets of primes.) We write the relation x - x € a, R asa 


congruence modulo the principal ideal a R > X= x, (mod a.) 6 


COROLLARY 1. Let R_ bean integral domain, and let Ayyceey ae be pair- 


wise relatively prime elements, Then for any Kyyerey HE R there exists xeR 
such that x = x, (moda,) for i= 1,...,n. Oo 
— i i — 


COROLLARY 2. Let n bea prime number with prime factorization 


m m 
n=p soo {a . ; let Z..= Z/nZ bethe ring of residue classes modulo n and let 
1 r — n ——— 


(i) Zz = Z Ea @) Goo Goes Fe (as a direct sum of rings) ; 
1 r 
p p 
(ii) U(Z) = U|Z m, Mona Fe WWI¥4 = (as a direct product of groups) - 
ie 
p p 
THe 

Proof, (i) Replacing n by r in (4), and setting R = Zz, ji =p, -¢ and 

= i 
S=Z A G) Googie 26 eye tie obtain a homomorphism »: Z — S_ with kernel 

1 ie 
Py Py 


Kero =f J; = nZ. Theorem 5 implies that ~ is epimorphic, since g.c.d. (p,, P,) = 
i 
=1 for i#j. 


(ii) Since the components kill one another in any direct sum of rings 

R = Ri aon Gl R. , in other words R, eS = 0 for i#j, it follows immediately from 

the definition of invertible elements and part (i) that U(R) = U(R)) Mooo ke UCR). Oo 
ie 


r m, 
Remark. Part (ii) of Corollary 2 immediately implies that o(n)= Tl @(p ’) 


i=l 


’ 


507 


' m m-1 : : 
and, since o(p ) =p (p - 1), we again obtain the formula for the Euler function (see 
Example 1 in Subsection 4 of §1). Since the order of an element in a finite group divides 


the order of the group, we find that 


a) = 1 (mod n) 


for any integer a primeto n. (This generalization of Fermat's Little Theorem is 
sometimes known as Euler's Theorem. ) 


By Corollary 2, in order fully to understand the structure of the group U(Z ) elt 


: F m 
suffices to consider the case n=p . 


THEOREM 6. Let m_ bea positive integer. 


(i) If p isan odd prime, then U(Z oy) is a cyclic group. 
p 


F 
(ii) The groups U(Z,) and U(Z,) are cyclic of order 1 and 2, 


respectively, while U(Z #Y) , m > 3, _ is the direct product of a cyclic group of order 
2 


m-2 


2 and a cyclic group of order 2. 


Proof. (i) By definition, an integer t primeto n hasorder r modulo n 
if (Cte nee ar; Hy téh.g The t £1 (mod n) if k<r. If r=(n), wecall t a 
primitive root modulo n. We usually choose t from among the least positive residues 


ORE n- 1 modulo n, but it is not really necessary to fix any particular set of 


5 tbe oon 9 
residue class representatives. 


According to Theorem 5 §1, the group ae = Oe) is cyclic, i.e., there exists 


m-1 
a primitive root ay modulo p. Since ap = ay (mod p) , it follows that the integer 
m-1 m-l 
: pees po leee, (pst 
a= ay is also a primitive root modulo p. Onthe other hand, a = ay = 


m 
- aus aay (mod p”). Hence, the coset a= a+ p'% generates a cyclic group of 


order p-1 in U(Z ne 


p 


Furthermore, 


508 


p : 
p_ Dh mB , W Fh 8 fa, 
(1+ p)" = 2, DP = lap ie +7 7) eee 


Since p > 2, wehave (1+ p)P = il sp a (mod se) . Making the induction assumption 


) jee 


‘ oe . 
that (1 + p)P = 1l+ pit (mod p ), we find that 


ae j+1l.p . p eCierD 
(+p  ={1+(+sp)p P= Do C+ spy p = 
i=0 
2 1 2 2j+)4+1 
Sea gaye +5 (p- 1+ spy pd ) tee 
jv j+2 j+3 pis m 
andhence (1 +p)P Sel+ p (mod p ). In particular, (1+p) =l1(modp ), 
tee m-l m = m 
but (1+ p)P =l+p . #1 (modp’); hence, the coset b=1l+p+p Z with 


cl 
representative 1l+p generates a cyclic group of order pb in U(CZ =) . According 
p 


to the proposition in Subsection 3 §2 Ch. 4 , if elements a and b have relatively prime 


orders p-1 and ee , their product must generate a cyclic group (ab) of order 


pe - i) = 9p) = luz _ | 
p 


(ii) The assertion about U(Z,) and U(Z,) is obvious. If m > 2, then 


A a 2 
Starting from the trivial congruence 5 = 1 + 2° (mod 2°) and using induction on j, we 


bf 


easily prove that 


j : : 
2 
Se!) ae 


In particular, 


alk 
5 £142" 21 (ma2%, 5 = 1 (mod 2™) 


ev 


oe 
sothat 5 hasorder 2™ modulo 2” , andthe coset 5+ 2% generates a 


cyclic subgroup of index 2 in U(Z =) - Note that, since 5) = 1 (mod 4) forall j, 
2 


m : : 
the coset -1+ 2° Z is not in (5 2a ve Since (ere 2" eee we have 


? 


509 


Ugo zy ee wy C= a) 


is an abelian 2-group of type gas 2) (see 85 Ch. 7). a 


COROLLARY. ‘The group U(Z) is cyclic (equivalently, there exists a primitive 


root modulo n) ifand only if mn hasthe form 2,4, a or Pests » where p isan 


odd prime. fal 
EXERCISES 


1. Prove that a non-zero element p ofa unique factorization domain R_ is 


prime if and only if R/pR_ is an integral domain. 
Ye 


2. Prove that, if an integral domain R is not a field, then R{X] is nota 


principal ideal domain. 


3, Prove that the set of elements x+y*-3 witheither x,y « Z or else 
x-#,y-%e¢ Z isan integraldomain R. Show that it is a Euclidean ring with function 
§ =N (thenormin Q(\-3)). Show that the subring Z {-3]¢ R_ is not even a unique 


factorization domain. 
4, Find all of the prime elements of the ring of Gaussian integers. 


5. Inthe case when R_ isa unique factorization domain, refine the corollary to 


Theorem 5 by introducing the elements a,= I a.. Find bie R_ such that 
it 


b, = 1 (moda,) and b. = 0 (mod a.) fipe il etl, Ie 2. paca 4 os. Gis 
i i i i a 1 n 


Introduce the element x = 2 b.x, , and verify that x = x (mod a,) 5 LS Ls io (this 


approach is especially convenient when n is large). 


6. Apply the previous exercise to a, = Big ay = 9, and to the pairs (x15 x5) = 


= (2,5),(3,2), and (3,5). What can you say about x considered modulo 45? 


510 


2 ; 
7, Let p be anodd prime. If the congruence x =a (modp) has a solution, 
then a_ is called a quadratic residue modulo pj; otherwise it is called a quadratic non- 


residue. The Legendre symbol (a/p) is defined as follows: 


0 if a =O (mod p) 


, 


© = 1, if a #0 (modp) and a isa quadratic residue , 


-l, if a#0O (modp) and a _ isa quadratic non-residue A 


2 eilby a 
Show that (a/p)= 1 ifandonly if a+pZe A“ » andthat (a/p) = al? v (mod p). 
Furthermore, (ab/p) = (a/p) (b/p), and the number of quadratic residues in 
{1,2,-.., p- 1} is equal to the number of nonresidues. Verify the following so-called 


quadratic reciprocity law for a few small values of odd prime numbers p and q: 


cw | tae i 


Py 4 Eee 
Ge RG ah) 
q Pp 
This law was proved in general by Gauss, who gave several different proofs, Use Example 1 


in Subsection 1 to derive the relation ((-1)/p) = (- 1 -D/2 5 


Z 
8. Prove that (in the notation of the previous exercise): (2/p) = (- py? aye ; 
i.e., 2 is a quadratic residue of primes of the form 8k +1 anda quadratic non- 


residue of primes of the form 8k +3, 


9. (Supplement to Subsection 5 §2 Ch. Bo) et iO (= ee ee belannone 
y 
zero polynomial with coefficients in % or ina field, where the xy are n° independent 


variables. Weconsider f asa function on matrices X = (x, ). Prove that if f(XY) = 


j 


= f{(X)f(Y) for all ron ve M, (8) , Where R isa ring, then f(X) = (det x)™ » where 
m is some non-negative integer. In particular, if f(diag(x,1,1,...,1))= x, then it 


follows that f(X) = det xX. 


10. Show that the ring Q,,(Z) of all rational numbers a/b with b not 


divisible by a fixed prime p (see Exercise 6 of §4 Ch. 5) contains a unique maximal 


ideal 


511 


J = fa/be Q,,(2) | p divides a}. 
A ring which has a unique maximal ideal is called a local ring. (Hint. J is 
obviously a proper ideal in Q 62). Lf e/d ¢ J athens ic ¢ pZ, and hence 
d/e «€ Q (2). This means that any ideal L obtained from J by adding just a 
single element c/d must contain 1, and therefore must coincide with all of 


(2) -) 


11. Show that in any local ring R with maximal ideal M, the elements not 


in M are invertible. 


12. Let R be a ring with mit. An ideal P is called a prime ideal if the 
quotient ring R/P is an integral domain. Every maximal ideal is prime. The 
complement M=R-P of P in R is a multiplicative subset of R (a monoid 
not containing zero). The ring Q, (2) in this case is usually denoted uP or 


simply Rp: Show that the ring “R 


P is always local, and that its maximal ideal Ma 


consists of fractions of the form a/b, where aeP and beR-P. Further show 
‘eh: Me) Re oes 
fhe P 
The operation of going from R to the local ring R, is called "localization" 


of R relative to the prime ideal P. 


512 


83. Modules 


The idea of a module incorporates a fundamental principle which developed in algebra 
about a half century ago: if we are interested in an algebraic system, we should study not 
only the internal properties of this system, but also all of its representations (in the broadest 


possible sense of the word). 


1. Basic facts about modules. We begin with the classical definition. Let R be 
an associative ring with unit, and let V_ be an abelian group written additively. Further 


Suppose that we are givena map (x,v) ® xv from Rx V to V_ which satisfies the 


conditions: 


(M1) x(u+v) = xu+xv , 
(M2) (x+y)v = xvtyv , 
(MB) (xy)v = x(yv) , 
(M4) Ds as 


forall x,y ¢€ R and u,v eV. Then V iscalleda left R-module (or a left 


module over R). We similarly define a right R-module. In what follows we shall speak 


simply of an R-module, even though in some situations we may want to deal with both types 


of modules at the same time, 


Of course, if we want to make this definition when R does not have a unit, then we 


omit axiom (M4). Moreover, it is possible to modify the axiom (M3) for use with non- 


associative rings. We shall give an example of a module over a non-associative ring at the 


end of the section. For now we shall be content with the above definition. 


Let V bean R-module. A subgroup UC V is called a submodule of Vif 


xueéU forall xeR and ueU, 


Now suppose that U and V are arbitrary R-modules. By a homomorphism of 


R-modules (or simply an R-homomorphism) from U to V we meana map 


513 


o:U- V_ such that 
o(u, + u,) = o(u,) + o(u,) . 
o(xu) = xo(u) 


forall u,,u,,ueU and xeR. It is easy to verify that Kero = {ue Ulo(u) = OF 
isan R-submodule of U, «nd Imo isan R-submodule of V. 
Given a submodule UC V over R, we define the quotient module V/U = 


={v+U lv ¢ V} to be the quotient group of the abelian group V by U with the action of 


R_ defined by the rule: 
x@vt+U) = xv+U 6 


The fundamental isomorphism theorem and the two isomorphism theorems proved first for 
groups (§3 Ch. 7) and then for ate carries over exactly, with only small changes in the 
proofs, to the case of modules. 

After §2 Ch. 7, where axioms like (M3) and (M4) were studied, and after 
Chapter 8 on group representations (with the axioms (M1), (M3) and (M4)), our examples 


of R-modules will hardly seem very new. Nevertheless, it is worthwhile to discuss and 


compare these examples. 


1) Every abeliangroup A isa Z-module. Namely, the map (n,a) h na 
from Zx A to A_ satisfies all of the axioms (M1) - (M4). It is often very useful to 


think of an abelian groupasa Z-module. 


2) Every abelian group A isa module over its endomorphism ring End A. By 
definition, End A consists of allmaps 9: A - A. which satisfy the condition 
(a + a') = o(a) + (a'). The addition and multiplication operations in EndA are 
introduced in the natural way: (» + w) (a) = (a) + dfa), @w) (a) = o(db(a)), L(x) =x, 
O(x) = 0. The map @,a) & 9(a) from End Ax A to A clearly gives A_ the 


structure of an End A-module. 


514 


3) <Avector space V overafield K is obviously a K-module. In addition, 
if we have a fixed linear operator @: V~ V, then wecangive V_ the structure of a 


K[X]-module , which we denote V 


q? by setting 


k 
f{(X)v = f{(@)v = av + a, Gv foicee ac Vv 


for any ve V andany polynomial fe K[X]. The axioms (M1) -(M4) hold, since if @ 


is linear, then sois f{(G), and also 
(f+ g)(@) = f£G@) + eG), (fg) (G) = f(G) g(@) 


(this is the universal property of polynomial rings; see 82 Ch. 5). The submodules of Vo 
are the G-invariant subspaces. In general, different operators G@ give us different (non- 


isomorphic) K[X ]-module structures Vo on the same space V. 


4) Any left ideal J ofa ring R hasanatural R-module structure with action 
Coy) xy for x<eR ye J, which is induced simply by multiplication in R. Inthe 
case J=R, this means that we are regarding R_ as a module over itself. This view of 


R_ can lead to important results. 


5) 1n the situation of the previous example, we construct the quotient module 
R/J = fy +Jly « R}. According to the genera) definition, (x,y +J) & xy + J is the 
action of R on R/J. Note that the canonical epimorphism 7:R-— R/J, which is an 
R-module homomorphism, satisfies the relation m(xy)= xy +] = x(y +J) = xa(y). But 
if J is a two-sided ideal, then R/J isaring, and 7 isa ring homomorphism: 
m(xy) = 1 (x) my). 

The intersection ) Vi of any family of submodules Vv, © V_ is a submodule of 
V. I particular, the intersection of all submodules containing a fixed set TCV gives 


us the submodule ¢T) spanned by the set T. it consists of all possible elements of the 


form x,t) AP x,t, ap OOO ae xt » Where x. € Rand t € T. In passing, we note that 


non-zero elements t 800 y t € V_ are called linearly dependent over R_ if we have 


xt, ab ooo ae XE = 0, where not all of the x= 0. The submodule spanned by a family 


515 


1G A066 ¢ Vel of submodules is called the sum of the vs and is denoted in the usual 
way: 2 Vv = vy As ate dh ve 

An R-module which is generated by a single element v_ is called cyclic. It has 
the form V = Rv = {xv |x e R} , where veV; this is the analog of a cyclic group. In 
particular, R_ itself isthe cyclic R-module R*1 (see Example 4 above). 

If Ve= Rv, mp OOD ap Rv is a finite sum of cyclic modules, then V_ is called 
finitely generated or an R-module of finite type. 

it is easy to check that the map xxv isan R-module homomorphism from R 
to Rv. Its kernel is denoted Ann(v) = Ann, (v) = {xe Ri]xv = 0}. Ann(v) isa left 
idealin R, which is called the annihilator of the element v. Thus, Rv = R/Ann(v). 
Anelement ve V_ with non-zero annihilator is called periodic. A module all of whose 
elements are periodic is also called periodic. If V_ does not contain any non-zero 
periodic elements, then V_ is called torsion-free. 


The unnihilator of an R-module V_ is the set 


Ann(v) = {ae RlaV = O} = 1 Ann(v) 


The module is called faithful if Ann(V) = 0. 

We can arrive at the same concepts from another point of view. Let V(x) be the 
set of elements ve V whichare killed by xeR. If R_ isan integral domain, then 
V(x) + V(y) ¢ V(xy), and it makes sense to speak of the torsion submodule 


. 


Tor(V) = i V(x). When Tor(V) = V, we say that V_ isa torsion module. On the 
xeéeR 


other hand, if Tor(V) =0, wesaythat V is torsion-free. 

The basic examples of periodic modules are: a) any finite abelian group (considered 
asa @Z-module; itstorsionis m@Z, where m is the exponent of the group); b) the 
module V. over K[{X] corresponding to a fixed linear operator @ (see Example 3; 


G 


the torsion is the principal ideal generated by the minimal polynomial of @). 


PROPOSITION 1. Ann(V) is always a two-sided ideal of R. Setting 


516 


(x + Ann(V))v = xv gives V_ the structure of an (R/Ann(V))-module . 


Proof. Set A = Ann(V). A_ is clearly an additive subgroup of R. Next, 
(xax')v = xa(x'v) = (xa)v’ = x(av') = x+0=0 for any x,x' eR, ae A, and 
ve V. Hence, RARCA, i.e., A isatwo-sided idealin R. Nowif x +A = 
—* + A, then x =x <¢ AQ sothat (x-x)v— 0, Lee Vie 2 Un VeemLe COR 
(x + A)v = (x' + A)v, i.e., the action of the quotient ring R/A on V_ is correctly 


defined. It is not hard to verify that V is an R/A-module under this action. Finally, 
Geab WY =O SS sec A eA Ae = xV=0 => xeaA 


Consequently, only the zero element in R/A annihilates V. i) 


Proposition 1 implies that the quotient ring R/Ann(V) is isomorphic to a subring 
of the endomorphism ring End(V) (see Example 2). 

If V and W aretwo R-modules, then the set Hom, (V; W) of all 
R-homomorphisms ¢:V- W_ is an abelian group under the operation of point-wise 


addition of homomorphisms: 
(0 + 7) (xv) = o(xv) + T(xv) = xo(v) + xt (v) = x(o(v) + T(v)) = x((¢ + T) (v)) 


If V and W = are modules over a commutative ring R , then the set Hom, (V, W) 
itself is an R-module , where we define xo for xeR and O € Hom, (V, W) tobe 


the map v 6 x(o(v)): 
(xo) (yv) = x *a(yv) = x(yo(v)) = «&y) (@(v)) = (yx) (0 (v)) = y(xo(v)) = y((xo) (v)) . 


If W = V, then the set End, (V) = Hom, (V, V) is a ring, where we take 
multiplication to be composition of R-homomorphisms oy: ( ° b) (XV) = w(b(xv)) = 
= O(xd(v)) = xp(b(v)) = x((@o BW) (v)). We should keep in mind that when we regard V 
simply as an abelian group we write End, (V), and, in general, End, (V) isa proper 


subring in End, (V). When V_ isa vector space over a field K, we often write £(V) 


517 


for End, (V) and call this ring the algebra of linear operators. 

The ring End, (V) of R-endomorphisms ofa module V_ is also called the 

centralizer of R in V. The role of this ring is especially significant in the case of 
irreducible (also called simple) modules, A module V overaring R_ is called 
irreducible if: a) V #0; b) O and V_ arethe only submodules of V; and c) 
RV #0 (the third condition actomatically holds if R is a ring with unit). It is clear that 
anon-zero R-module V_ is irreducible if and only if V = Rv isa cyclic module for 
any ver 0 in V- 

PROPOSITION 2 (Schur's lemma), If V and W are two irreducible R-modules 


and ¢ isanon-zero R-homomorphism from V to W, then o is an isomorphism. 


Furthermore, End, (V) is a division ring (skew field) for any irreducible R-module V. 


# 
For the proof see §4 Ch. 8, where the same basic fact (Theorem 1) is proved for 


irreducible G-spaces. oO 


2. Free modules. Wecallan R-module V_ an (internal) direct sum of the sub- 


modules V vif V=V,+---+V. and V.N v. = 0 for 
eae n 1 n i {Fi j 


gb eee 


i= Wy... , fi. In other words, we write) VY — vy @...@ Ne (denoting the direct sum 
of submodules) if any element ve V_ can be written in one and only one way as a linear 
combination v = vy Sas Sa Vi . If we are given R-modules Vy» B00 5 ve ‘4 
then we define their (external) direct sum in the obvious way (just like the rings) with the 
action of xé¢R ona row (vy> ey Valg VV i defined by: x(v), sey WV) = 


n i n 
= (xv), 608-f xvi)- 
Now suppose that V isan R-module and 1h pane war is a finite subset in 
Vv. We say that Wao oda 5 we are free generatorsof V if V= ies +++ + Rv. and 


ifevery map from the set he BpoD y By to any R-module W_ can be extended to 


an R-homomorphism @: V-W for which ov) = oly) 5 2S lls coa 5 ike 


518 


PROPOSITION 3. The following are equivalent: 
(i) the set 1, pode s Mal is a set of free generators of V ; 
(ii) the set Ya es mar is linearly independent, and (v,,..., ya, 


(iii) every element véV_ canbe written uniquely in the form v = > x v; ; x € R; 
(iv) V_ is the direct sum Rv, @...@ Rv. » and Ann(v,) SO HS Ms sae y Ns 
(v) V_ is isomorphic to the direct sum of n copies of R_ considered as 
R-modules ,_i.e., to the module R” of rows (x, poco 9 x) of length no with 
components K€ R. 
The proof is similar to the argument in Chapter 2 in the case of vector spaces over 


a field, except that one must be careful not to assume commutativity of R or that every 


element has an inverse. QO 


It is possible to construct (rather complicated) examples of non-commutative rings 


n 
R for which R™ = R for m #n, but commutative rings behave well in this respect. 


PROPOSITION 4, The rank (number of free generators) ina finitely generated free 
module over an integral domain R is uniquely determined. 


Proof. Let van oer vat and {u, eects no, be two bases for a free module 
V over R. Then 
m 


n 
ee De igs ewes o 2, he = 


i=1 


Since R_ is commutative, we obtain the relations AB = En and BA=E_ forthe 

n 
Matrices A = (a) and B= (b) » which have dimensions mxn and n Xenia 
respectively. By imbedding R_ in its field of fractions Q(R) andusing Theorem 4 §4 


Ch. 2 (which holds for any field, not just for IR) » we find that min(n, m) > m= and 


min(n,m) >n, sothat m=n. We note that it is impossible tohave m <e@ and 


519 


n=®, since only finitely many basis elements a occur in the expression for each u, , 
i 


and so that finite set of Ve generates allof V. Oo 


Remark. This proposition can be proved for any commutative ring R_ by choosing 


a maximalideal J in R and passingtothe field R/J. We omit the details. 


We note that, unlike in the case of vector spaces over a field, it is not true that any 
set which generates a free R-module necessarily contains a basis. For example, any 
two primes p and q generatethe “Z-module Z, but {p 9 q} is not a basis (since 
the elements have a linear dependence relation pq-qp = 0), and {p} and {q} are 


not bases, since they do not generate Z. 


THEOREM 1. Every R-module of finite type is a homomorphic image of a free 
R-module _ of finite type. 


n 
prot.) eh Ui = De Ru, be an R-module which is generated by n elements 
Ve iL 
Usscees ae We take the free R-module ao (see Proposition 3 (v)), and we let vy; 


be the standard basis element (0,...,0,1,0,...0) with 1 inthe i-th place. The 
map ©: V-U_ given by (x)5 ees x) ez xu, clearly expresses U asa homo- 


morphic image of the free module V. 0 


It is not always true that a submodule of a free module is free, even if the submodule 
is a direct summand. Here is a simple example. Let R = Xe po We eRe (2a Ore), 
V=R(3 +62). Then R=U®@V isthe direct sum ofthe R-modules U and V, 


but neither of these modules is free (since [R| = 6, while lu| =3, lv| = 2) 


THEOREM 2. Let V= Rv, G3 Goo Rv. be a free module of rank n overa 


principal ideal domain. Then every submodule U of V isa free module of rank 


m<n. 


520 


Proof. First suppose that n=1, Le., V=R. Any submodule UC V is 
isomorphic to an ideal of R, andhence U=(u)= Ru. If u=0, then U=0 (we 
consider the zero module to be free of rank zero). Butif u #0, then au #0 forall 
non-zero aeR, since R is an integral domain. Hence, U_ is a free (cyclic) module 
of rank 1. When n> 1 weuse induction. We consider the free submodule 


V'= Rv, @... @Rv_ of rank n-1 in V. The quotient module V = V/V' isa 


free cyclic module, with generator eet + V'. It contains the submodule 


U=U+V)/v. f U=]0, then Uc Vv’ , and then the theorem is true by the 
induction assumption. But if U # 0, then the case of the theorem which has been proved, 


tells us that U has a cyclic generator a, = ee V', where ure U. First suppose 


that UN V' = 0; then tes D amy aii ws a a, ape Ra 


z ura ue Wasa s au U = Ruy is a free module of rank 1. Finally, suppose 


that UN V' #0. By the induction assumption, the submodule UN V' of V', which 


’ 


has rank n-1, hasa free basis WD cous Oh | where 0 =< m=1 <n-1) By en 


argument similar to the previous one, we see that tu, pUneees ay isafree R-basis 


for U. Namely, ueUsu=u+V'eUsus=ayu,, apeR=u-apueUnv'= 

= Day = a, Uy ap ODO Se a = ie ct a ao 5p COR Sp pe es » mM<n. 

According to Proposition 3 (ii), we must check that the Ujr,ee, UL are linearly 

ind ent. = fil =: U = i Vv = i 
independent. But 2 x,u O=>x yu, 2 xu, O) in Vv) ences xy O, since 


Ww is abasis of U. Since {uy,-.., 0} is a free basis in UM V’, we have: 


Molnar ooo sree Wh SW Sk Sa ek = Oy oO 
m m 


COROLLARY. Every submodule of a module of finite type over a principal ideal 
domain is itself a module of finite type. 


The proof follows from Theorems 1 and 2 and the second isomorphism theorem 


(the theorem on the correspondence between submodules). Oo 


521 


It is not very difficult to obtain a complete description of the modules of finite type 
over a principal ideal domain R. However, the main examples which would interest us in 
such a description (periodic modules over Z andover K[X]; see Examples 1 and 3) 
have already been treated (see §5 Ch. 7 andthe Appendix). For a unified module-theoretic 


approach to various problems of this sort, see the list of supplementary reading. 


3. Integral elements ofa ring. Let R be an integral domain. An element 
te R_ is called integral (integral over Z) if t is a root of a monic polynomial 
n xo ; Ae ; é 
xX + ay ticee + as e Z[X]. If R isa finite algebraic extension of Q or when 
R_ is the field generated by all complex algebraic numbers, we call such an element t an 


algebraic integer. Of course, the set of algebraic integers includes Z. Exercise 9 of 


3 
84 Ch. 6 shows that a rationalnumber t is an algebraic integer if and only if te Z. 


ail 
Next, if we have aju +a,u sen eas aceZ, then (ayu) + 


+ aga, (@ a teal aA 0, and hence any algebraic number can be multiplied 


by a suitable ay € Z to obtain an algebraic integer. 


Returning to the general case, we note that it is convenient totreat R asa 
Z-module. Any elements Crotoreees to é€ R generatea sub-Z-module “ty a Zt. + 


bison + Zt of finite type in R. In particular, if t is an integral element, with 


a ae are a= 0, a,¢%, thenthe subring Z[t]cR isa Z-module of 
al 
finite type, smce Z[t]= Z1+ Zt+--- + te Conversely, suppose that Z[t] 


isa @Z-module of finite type with generators v ve ¢ R. Then the relations 


ee 


\e, Sf Soon OH ee = WG 


ee a te 2 


11 


It 
oO 


of: Fe Ah Se. Boog ape eel x 
nl 1 me 2 ( Pen 


considered over the fraction field Q(R), has a non-zero solution (x) arses) = 


= (v v_) (not all of the v, are zero, since 1 e¢ Z[t]). Hence, the determinant 
n 


ee 


of the system is zero (see Chapter 3), andso t isa root of the monic polynomial 


522 


f(T) = det(TE - A). We have proved that an element teR_ is integral if and only if the 
subring Z[t]c R isa Z-module of finite type. 


THEOREM 3. The integral elements of a ring R forma subring of R. 


i 
Proof. Let u,veR_ be integral elements. Then Z[u,v] = a Zu'y) 
1<i,j<m 


isa @Z-module of finitetype. Since Z is a principal ideal domain, the corollary to 
Theorem 2 tells us that the two submodules Z[u-v] and Z[uv] are also 
Z-modules of finite type. By the above criterion, the elements u-v and uv must 


be integral. ia 


Example. Any root’of 1 is obviously an algebraic integer. By Theorem 3, any 
integral linear combination of roots of 1 is also an algebraic integer. In particular (see 


the proof of the proposition in §4 Ch, 8), the values Xq (8) » &¢€G, of the character of 


any complex linear representation © ofa group G are algebraic integers. 


4. Unimodular sequences of polynomials. Let R = RIK, 5++- 5X] be the ring of 


»f_] of r fpoly- 


polynomials in n variables over a field K. A sequence (£1 5--- S 


nomials f,eR is said to be unimodular if Rf, +RE,+ pacar AE SIRs aleGen 


Oe ee ee (1) 
for some u eR, l<i<r. Further let V be a module of finite type over R. In 
connection with certain delicate questions in algebraic geometry, the French mathema- 
tician J.-P. Serre in 1955 stated the conjecture: 

v@R = gt as y ~ Qty 
which can be given the following elegant form: “every relation (1) can be written 


in the following form for suitable Me eR: 


Sie oe a “Or cy 


523 


Despite its apparent simplicity, this conjecture was only proved in 1976, 
independently by A. A. Suslin (USSR) and D. Quillen (USA) (though the case n=1 was 
done already in 1848 by Hermite). The case n=1 is included in the following 


more general theorem. 


THEOREM 4, Let Ay> Agrees, a. (r>2) be nonzero elements in a vrincival 
ideal domain R, and let d= g-c.d.(a,,--.,a,). Then there exists a matrix 


Ame MCR) with first row (a, 4y0-++98,) and with det A= d. 
Proof. We use the result at the end of subsection 1 §2. If r=2, then, 


writing d in the form d=u.a,+u ay with u, € R, we immediately find the 


illegal 2 
reaguired matrix: KE “Tl ey Ve now use induction on r. We revresent d' = 
Uy Uy 
“ g.c.d.(ajs+++58)_4) in the form d' = det A', where A' ¢€ My ® is a matrix 
with first row (ajs+++54,_4)- Since d= gec.d.(d',a), we can write d=ud'tva_. 
We introduce the matrix 
a 
r 
0 
Li . 
A= : ‘ 
0 
oa are 
ru ee ul 
The first row of this matrix is (ajo+++54,)- If we expand det A along the last 
column, we find that 
~ +1 
det A =u det A’ + (-1)" ta, det A" = ud! +a. Ci =cern (3) 


where A' is the matrix obtained from A by removing the first row and last column, 

On the other hand, if we multiply the first row of A‘ by -wv and then out 
this row in the last place, preserving the order of the other rows (this can be done 
by successive transvositions), then we obtain a matrix A'’, which is the matrix we 
get if we multiply the last row in A" by d'. Thus, 

A) GERI Sse Ss ee Se 

We substitute this expression in the relation which is obtained from (3) by multiply- 
ing both sides by d': 


+ 2 +] 
ae cae’ S aca + a. (-1)" : ros (-% oe: Col a) (-1) 


1 ' . 
d' (ud! + va)3 


canceling d', we arrive at the required relation d = ud' + oes det A. O 


524 


For n?1 the basic idea of the proof is to study the action of the group 


@iGe, ‘Ds XD on the set of unimodular sequences and proceed by induction 


po 
on n. The reader can read about the nroof either in the original article (A. A. 
Suslin, Projective modules over a polynomial ring are free, DAN SSSR, 229, No. 5 
(1976), 1063-1066) or else in the Bourbaki report by D. Ferrand (Sém. N. Bourbaki, 
28éme année, 1975/76, juin 1976). The exposition is completely elementary. To 
appreciate the effort required to attain such a proof, one need only look at the 
earlier Bourbaki report by H. Bass (Sém, N. Bourbaki, 26éme année, 1973/74, juin 
1974). These references contain the statements of some unsolved problems. This 


whole circle of questions is a good topic for discussion in advanced seminars at 


the graduate or undergraduate level. 


84. Algebras over a field 


1. Definitions and examples of algebras. Until now we have not made much use of 


the fact that almost all of the rings we know are also vector spaces over a field. 


Definition. An algebra over afield K isa pair consisting of a ring (A,+, °¢) 
anda vector space A over K (the underlying set A is the same for the ring and the 


vector space, as are the addition operation and the zero element), Here 


A(xy) = (Ax)y = x(Ay) 


forall }<«K, x,yeA. An algebra is called associative if the ring (A,+,°) is 


associative. By the dimension of the algebra A we mean the dimension of the vector space 


A over K. 


525 


The basic notions of ring theory carry over to algebras. For example, a subalgebra 
ofanalgebra A is asubring BC A_ which is also a vector subspace of the vector space 
A. If T isasubset of A, thenthe subalgebra K[T] generated by T is defined 
to be the intersection of all subalgebras of A which contain T. We similarly define 
ideals and quctient algebras. By an algebra homomorphism we mean a ring homomorphism 
which is at the same time a K-linear map. 

The center 2Z(A) of an associative algebra A_ is defined as the set of all elements 
aeéA which commute with every element of A:ae Z(A) #® ax=xa, ¥xe A. Itis 


easy to check that the center is a subalgebra of A:(a-a')x = ax-a'x=xa-xa' = 


i] 
" 


x(a~-a'), (aa')x = a(a'x) = a(xa') = (ax)a' = x(aa'), (Aa)x = A(ax) = 4(xa) 


x(,a) forall a,a'e« Z(A),X} « K. Wehave Z(A)=A ifandonlyif A isa 
commutative algebra. 

If A is an associative algebra with unit 1, then we immediately see that 
X*le Z(A), andthe map 4% A¢*1, Fi eK, gives a monomorphism from K _ to 
A. Hence, we may consider an algebratobearing A _ together with a specified subfield 
which is contained in its center Z(A). 


Here are some examples of algebras. 


1) Anextension F > K_ of finite degree [F:K] overa field K_ is obviously a 
commutative associative algebra (with unit) having finite dimension dim, F =[F:K]. We 


have already studied this example in §1. 


2) The polynomial ring R = K[X ad with coefficients ina field K has 


pr? 


the structure of an infinite dimensional commutative associative K-algebra. Note that 


R=R,@R, SR e.--- 


is a direct sum of the finite dimensional vector subspaces Ra consisting of homogeneous 


. Inthis 


polynomials of total degree m. We have Ro = K, and R. i & Bae 


situation we call the algebra R_ graded. 


526 


3) The commutative algebra Xo (G) with unit x. which is generated over € 
by all of the characters of a finite group G, has dimension r equal to the number of 


conjugacy classes in G (Theorem 2 &7 Ch. 8). 


4) The ring M,, (K) of nxn matrices with entries ina field K is an algebra 
aicimiien a over K. The basis elements {E,, liege = joo as ee or the 
algebra M. (K) multiply together according to the rule Ex Ea = Ong 25 . According to 
Theorem 3 §3 Ch, 2, we have Z(M_(K)) = oe = he 


We call an associative algebra A with unit central simple over K if Z(A)=K, 


and if A has no two-sided ideals other than O and A. 


PROPOSITION i. M_(K) is a central simple algebra. 


Proof. It remains to show that any ideal J in M, (¥) which is not the zero ideal 


must be all of M (K) . Let 
0 # aa ) a E J i. 
ae ee 


en 
if uO, Fo, then Ee a ae ec fortany si teal) Me; etce. 


J=MAK). oO 


Proposition 1 also holds for the full matrix algebra M, (D) over an arbitrary 
division ring D. The extremely important Wedderburn theorem (which, in a more general 


context, is known as the Wedderburn-Artin theorem) says that, conversely, every finite 


dimensional associative simple algebra over a field K is isomorphicto M (D) , where 
a COLI PL Era Be Eas Ovelariiel Bs) telat e ie we eo n REAR ES, 
the naturalnumber nis uniquely determined, and the division ring D (which is a finite 
dimensional algebra over K) is uniquely determined up to isomorphism, 


The matrix algebra M,, (4) also has the following universal property. 


PROPOSITION 2. Any n-dimensional associative algebra A overa field K is 
isomorphic to some subalgebra of M, (K) » where k< n+l. 


527 


Proof. First suppose that A _ is an algebra with unit 1; we shall imbed it in 
ae (K). To do this, we associate to every aeA the linear operator L :x ® ax from 
a 
AD eto A. L. is linear because of the bilinearity of multiplication in A. Since 


obviously L, =AL,L ee IL 


ha , es = L, Le (by associativity!) , and 


pb? Mab 

L, = €, it follows thatthe map »:a & L. is a homomorphism. Injectivity is ensured 

by the presence of the unit element: a # 0 = LQ) =ae°l=a,_ so that L, ce Os 
Now suppose that A _ does not have a unit element. Consider the vector space 

A = K@A, and define a multiplication on A by setting (4,4) (\',a') = QA’, aa’ t+ 

+ha'+'a). It is easy to verify that with this multiplication operation 4A becomes a 

K-algebra with unit element (1,0). Since dim, = dim, A +1=n+1, the above 


argument allows us to imbed A » andthus A, in M4168) : Oo 


There is a close resemblance between the proof of Proposition 2 and that of Cayley's 
theorem for finite groups. In both cases we used the regular representation. More 


generally, by a representation of a K-algebra A we mean any homomorphism 


A-7 £(V) = End, (V), where F>K_ isa field extension of K. In other words, we 
supply the F-vector space V witha left A-module structure in the sense of the 


definition in §3, and we have 
(Nee) OA Se Se 0 GA) for all rKXeK, xeA, veV ‘ 
If we choose a basis in V, we arrive, as in the case of groups, at a matrix representation 


A> M .(F) , where r = dim, Vv. 


2. Division rings (skew fields), As the above theorem of Wedderburn indicates, the 


study of division algebras is an important part of the general structure theory for associative 


algebras. Schur's lemma (Proposition 3 §3) also supports this observation. Before giving 


some results on division algebras, we take up the following auxiliary fact. 


528 


PROPOSITION 3. In an associative algebra A (with unit element 1) having 
dimension n overa field K, everyelement ae¢eA_ isa root of a polynomial 
fi¢ K[X] of degree <n, Anelement aeA_ is invertible if and only if i. (0) 4 0, 


where f denotes the monic polynomial of least degree, If A has no zero divisors 
— a ee 


then A_ is a division algebra. If K is algebraically closed, then n= 1 and A=K,. 


2 
Proof. Since A_ is finite dimensional, the elements 1,a,a ,... cannot all be 


m 
linearly independent over K. Hence there exists a monic polynomial i. (X) =X + 


ate eek a Re en # 0 of minimal degree m <n, with coefficients a € K, such 


that i, (a) = 0. If Cs # 0, then the equality a (a) = 0, written in the form 


{- 2 pe + a, am teee a -pa = 1, shows that a is invertible. Conversely, 


Suppose that aeA_ is not a zero divisor, but oF = 0. Then 


m-1 m-2 m-1 m-2 
(a + Qa Petee dO )ae= Ola a + aa feo Cn = O) , 


which contradicts the minimality of £(X) . Hence, oF # 0. In particular, all elements 
of A_ which are not zero divisors are invertible. 

If the field K is algebraically closed, then fs (X) = (X - c,) en Xe) cy ; 
c.eK, sothat (a ~¢))b = 0, where b = (a ~ Cy) ose (a ace) %# 0. Since A hasno 


zero divisors, the only possibility is that m=1 and a- oe 0, andso a= C1 € kK. 


Since this is true for any ae A, wehave A=K. O 

We see that the properties of a division algebra depend in an essential way on the 
ground field K. It is natural that, historically, division algebras over the real numbers 
IR have aroused special interest. The existence of the field C= R+iR inspired the 
search for other "hypercomplex systems", i.e. , division algebras over IR. This search 


met with success in 1843, when Hamilton constructed his famous algebra of real quaternions. 
Example (the quaternion algebra H). Formally, we write 


SR oR ei 


? 


529 


where i,j,k are quantities which are multiplied together according to the rule 
tej sis e ol, Wek of, sel = aejpeisi = jo oie | 


os aa eee a , ? oes 
Anelement x 0 + 2 op 3 + a, k € H_ is called a quaternion. It can be verified 


directly that H is an associative algebra with center Z(H) = R. But it is more worth- 


while first to consider the following model of the algebra HH -- the set 
a b 
@(H) = ee a, be dec M, (€) 
-b a 


It is an elementary exercise to show that @(H) is a division ring. We did a similar 
exercise in §1 Ch. 5, whenthe field @ was introduced. We need only remember that 


multiplication in @(H) is non-commutative. According to the rule for computing the 


inverse of a matrix, we have a 
-l _ 
a b -l -b 
pe = 50s ile ’ 
-b a b a 
where 
a b eek ie 
8 = det} _|) = aa + bb (4 0 if a #0 orif b # 0) 
-b a 


Incidentally, this implies that the multiplicative group (H)* = @(H)\{0} contains a 


subgroup isomorphic to SU(2) (see §1 Ch. 7). 


If we set 
iL ol iO @ il 0 i 
90 Ho ay’ BO Ip a 2 en ol, Rell 
we notice that 
2 Sj 
Eh eT 5 ah el Ce eae ere etree G34) » 


Cis a 8 


530 


We see thatthe map @:H — @(H) given by letting 1b qo» ib qy> jp Qo » 
kb 3 » is a two-dimensional complex representation of the quaternion algebra HH. The 


quaternion x corresponds to the matrix 
oe = = 9 ie oe 


where a = a + id, » b= a, + ia, ,» i= ¥-1. The quaternion units i,j,k generate 
the subgroup Q3 of H*, which is a group of order 8 we encountered before. The 


restriction ¢}]. is the irreducible 2-dimensional representation of Qe that was 


Q8 
constructed at the end of §3 Ch. 7. 
To every quaternion x = % + ai + a, j + a, k we associate the conjugate 


quaternion x* = % - ai a a, j e a, k (this is analogous to complex conjugation). The 


conjugation operation has the following obvious properties: 
(x+ty)* = x*+y*; x* = x @= xe RR; x* = -x @ a = 0 


(in the latter case we call x a "purely imaginary" quaternion). The product xx* = N(x) 
is called the norm of the quaternion x. Using @, we can easily show that (xy)* = y*x* 
2 2 2 


and N(xy) = N(x)N(y), where N(x) = det @(x) = a + a + a, + a, ; 


‘The unique place occupied by the quaternions is clear from the following theorem of 
Frobenius: There are only three finite dimensional associative division algebras over R, 
namely, R,@, and H. The essential fact used in the proof is that the minimal poly- 
nomial f. (X) of any non-zero element t_ in the division algebra D whichisnotin R 
must be quadratic (see Proposition 3 and Theorem 1 84 Ch. 6). We shall not give the 
proof here. 

Relatively recently a proof using deep topological techniques was given for the fact that 


any finite dimensional division algebra over KR (not necessarily associative) has 


dimension 1,2,4, or 8. There is actually one example in each dimension. 


More than 70 years ago, Wedderbum obtained a beautiful result concerning finite 


531 


division rings, which is important in geometry. We shall now prove this theorem, which 


bears a direct relationship to the material in §1. 


THEOREM 1 (Wedderburn). Every finite associative division ring is commutative. 


Proof. Let D_ bea finite division ring, andlet Z be its center, It is obvious 


that Z isa field, andthat D_ is a finite dimensional vector space over Z: 


ID) = 2 eee 


According to the results of §1, we have Z = oo for some q = che » and hence 

Ip | = q: . Suppose that xe D\Z. The elements of D which commute with x forma 
Set C(x) =1y € D lyx = xy}, which is closed under addition and multiplication. In other 
words, C(x) isa subdivision algebra of D whichcontains Z. If qe is the number 
of elements in C(x), then d= d(x) isa divisor of n (with d< n), since, if we 


interpret D asa left vector space over C(x) 
by = C(x) f, fovee +f C(x) f. 


we have aa = IC (x) 1 = ae . Wenow note that Z* is the center of the multiplicative 
group D*, and (Gn - 1)/(a° - 1) = (D*: C(x)*) is the number of elements conjugate to 


x in D*. Hence, the formula (2') in §2 Ch. 7 takes the form 


n 
fel = cr = Ge ba y= ., (*) 
diay =e 


where d runs through some set of divisors of n lessthan n. The properties of the 
cyclotomic polynomial en (X) proved in §1 show (see Exercise 6 §1) that the integer 
® (a) divides both q: -1 and (q° > as - 1) for d|n , d<n. Thus, by (*), 
o. (q) l(q - 1), and this means (see Exercise 7 81) that n=1, andso D=Z is 


commutative. 0 


3. Group algebras and modules over them. When studying the regular represen- 


532 


tation of a finite group G in 81 of Chapter 8, we introduced the vector space 
<e le € Gy over afield K. Wenow make this space into a K-algebra by setting 
g 
ee=e and extending this rule by linearity to allvectors 2@e , a eK. To 
gh gh ge" 8 
simplify notation we shall usually replace e. by g, andconsider the set K[G] of all 
possible formalsums Zag, @ ¢€K. Bydefinition, TaAg==2BR gea =8B , 
g g g & g 
VgeG. The following operations on formal sums 


y ag + yy Bg 2 (a, + Bg b 


g g g 
r ag Marg 3, (1) 
[Eas] = 3, 

De 98) |20 A 


g 


I 


u 


2, a, B.gh = DS yu, where y = » a, by 
give K[G] the structure of an associative algebra. K[G] is customarily called the 
group algebra of the finite group G overthe field K. Theelements 1*g of K[G] 
for geG formabasisfor K[G] asa K-vector space. Thus, dim, K{G] = Ic] , 
We consider the group G tobe imbeddedin K[G] by gt1l+*g. The identity eeG 
is the unit element in K[G]. If wetake K to bea commutative and associative ring with 
unit (but not necessarily a field), we similarly define the group ring K[G]. 

Furthermore, a similar construction is possible for a group G _ which is not 
assumed to be finite, if we agree only to consider sums > ae having only finitely many 
non-zero coefficients. It is sometimes convenient to consider such a formal sum 
As D> a as a function on the group G_ (defined by A(g) = a with values in K 
which are non-zero for only finitely many ge¢G. Thenthe formulas (1) correspond to 


the operations of point-wise addition of functions 


(A, + A,) (g) = A, (g) + A, (g) 


and convolution of functions 


533 
-1 
Ay Ae A) = 2 A, (g) A, (gu) 


The theory of group rings is a vast field of algebra, having its own techniques and 
problems, but for our purposes K[G]_ is introduced merely to illustrate some of the 


general ideas in Chapters 8 and 9. 


THEOREM 2, ‘There exists a one-to-one correspondence between K[G ]-modules 


which are finite dimensional vector spaces over the field K and linear representations of 
the group G. 


Proof. Let (@,V) bea representation of G. Weextend @ by linearity to the 


elements of K[G], by defining 


ap “3 = a, (8) 


and we set 
(= “3 oy = ye pee BRS Y¥vrve V 


The operation o gives V a K[G)]-module structure in the usual sense. We note that 


(x “1 ° (Av) 


tl 


D % Pw) Av) = Dar aie)v = 


(Ere) Ew). 


i.e. , scalar multiplication in V and K[G] are compatible. It is natural to call the pai 


®, V) a linear representation of the algebra K[G]. 


Conversely, if V is a vector space over K whichisa K[G]-module with 


action (= ae vy eb (z a) ov, then, setting 
a(x os)" = (x a8) 9 ; 


we define a homomorphism @ : K[G] - End, (V) (i.e. , a representation of the algebra 


534 


K[G]), whose restriction © = S G to G_ gives us a representation of the group G. 0 


Because of Theorem 1, a representation space V ofagroup G_ is often calleda 
G-module. Similar terminology also applies to other concepts in representation theory. 

Now let G bea finite group, and let K = @_ be the field of complex numbers. 
According to the results of Chapter 8, every irreducible G-module over @ (ie., 
€[G]-module) with character X; is isomorphic to some left ideal J, in the algebra 
€[G] (in this connection see Example 4 in 83). If dim, J, =n, then @[G] contains 


a direct sum A = J; 1 Pee @ Fi a of a left ideals which are @[GJ]-isomorphic to 
pink 
i 


J. = J 1° If we choose one ideal iF in each isomorphism class of left ideals, we can 


write the decomposition 


a({G]=A @A, @... 8A, : (2) 


1 


which corresponds to the decomposition of the regular representation of G. We note that 
each of the components Ay is uniquely determined. 

If J happens to be a minimal left ideal of the algebra C[G] , andif te C[G], 
then Jt is also a minimal left ideal (possibly the zero ideal). Hence, the map »:J- Jt 
given by vbvt (veJ) is either the zero map orelsea (C[G]-isomorphism , Siltres 
xv e J forany xe @[G] and (xv) = (xv)t = x(vt) = xo@(v). For this reason 
ye A. = re A foralh t «¢ €[G], andhence Ay is a two-sided ideal of C[G]. 


Since (2) is a direct sum decomposition, we have 


ee A a cA NA = 0 


j j 


We would like to obtain more precise information concerning the decomposition (2) , 
using the theory of characters from Chapter 8. We first find the center Z (C[G]) ofthe 


group algebra C[G]. By definition, 


ze Z(C(G]) = zg=gz, V¥eeG . 


535 


heG 
reG & e h h teG tg 
andhence ¥ , =¥Y _,, ¥teG. Setting t = gh, weobtain y = y . This 
I 1 : h -1 
gt tg ghg 


means that 


Z(a(G]) = CZ) Zn peers ZG ‘ 


where 


3 g; ig=apleed er (3) 
aac 
Beg. 


é 
(g) ; Bo 5 620 ¥ g. are representatives of the conjugacy classes in G). Clearly, 


Z14z z. are linearly independent, andhence dim, Z(C€[G]) = 


grttts C 


(i) 


To every element a ¢€ Ay we associate the linear operator L, 


ne 


which acts on 


the minimal left ideal iF = J, l according to the rule (v) = av,ve Fi . Since 


USO eee 


obviously Loo = AL, oe ae wil - ~@ ,@ 


sans - bb? eae L. no it follows that 


concn a > me is a homomorphism from the algebra Ay to the endomorphism algebra 


Endo. iF = an (C). Suppose that 0 #ae€ Kero, i.e., aj, = 0. All of the left ideals 
i 


Ts are @[G]-isomorphic, and, if °, iF = J; j is an isomorphism, then 
= = = . = 0) = 0 . 


Hence, aA, = aJ Pe ance Ee = 0, and in that case we also have a@[G]=0, 
i pial, 
i 


since a é€ A; => ay = 0 forall j #i. However, ae =a #0. This contradiction 
2 
shows that Kerg = 0. Thus, go isa monomorphism, and, since dim AY = n = 


= dim M, (C), itfollows that g@ is an isomorphism from A, to M, (C). Using 
i i 


536 


Proposition 2, we arrive at the following structure theorem for the group algebra C[G]. 


THEOREM 3. ‘The group algebra C[G] of a finite group G_ over the field of 
complex numbers decomposes into a direct sum (2) of simple two-sided ideals which are 
isomorphic to full matrix algebras: 


c[G]= oy @ ete @... @ eee e 


In particular, the group algebra of an abelian group of order n over the complex numbers is 


isomorphic to the direct sum of n copiesof @. (ey 


COROLLARY (Burnside's theorem). Let @ bean n-dimensional complex 


irreducible matrix representation of a finite group G. ‘Then there are ne of the matrices 
e, » § € G, which are linearly independent, ie., Soe ge Ge = M, (©) : 0 


The structure of the center Z(C[G]) asa commutative subalgebra of @[G] is 


completely determined by the so-called structure constants -- the integers ny in the 
relations 
ie 
k 
eae Ge | a 


Keeping in mind the expression (3) for Z. » we easily see that a is the number of 
i 


: G G 
pairs (g,h), Seg, heg 


i? for which gh = g 


k! 


We choose another basis in Z(€[G]) as follows: 


iol. r n 
i ae i — 
. = Tel oe, X; (8, zy = al y x; (8) & 9 i sS i S Tere. (5) 


geG 
Here, as in §5 Ch, 8, Xyoreey X, are the characters of the irreducible representations, 
and Ayyere, DL are the degrees of those representations. We go backwards in (5) using 


the formula 


537 


To see this, one uses relation (4) §5 Ch, 8, That relation also shows that 


il 5 i il 
De en = TeT Sy g ys n. x, (8) ey 2, BD) x, (e) x) (8) = Tet e Ic, (e)| = 2. 


geG i 


Next, applying the generalized orthogonality relation in Exercise 1 §4 Ch. 8, we find that 


[al Tals n,n, 
eee ay eee 1 cle 
e,6, = x, (g) x, (t) gt = » TeT 2 hee) ye ns 
‘a. Ic | gate G : j le| heG GS geG 5 J 


coat = 
a a, RE Sg 


Thus, the central elements eis which are computed by formula (5), satisfy the relations 


Sy oon ee 


eae = 0), te 4 (6) 


and for this reason are called the central orthogonal idempotents of the group algebra c[G]. 
The relation e = ey tee t ae is the condition that this set is complete. If we set 
B. =e c(G], we immediately find that B. is a two-sided ideal in @[G] with identity 


element eis and that we have the following direct sum decomposition: 


c(G] = BL @ B®... © BL : (7) 


It follows directly from (5) that 
] — 
x; (e) = a, ToT E x, (8) x; (8) = ni 6; : 


Hence, B, contains the minimal left ideal JC A. corresponding to the character Xi° 
i 
Since Ay and B. are two-sided ideals, we have A, iS B. . Comparing (2) and (7), we 


conclude that A; = B. . We have thereby proved a more complete version of Theorem 3e 


538 


THEOREM 4. The elements eos 1 <i< r, which are computed using (5), form 


a complete set of central! orthogonal idempotents for the group algebra C[G] of a finite 
i in the direct sum decomposition 
group G. ‘The simple component a C€(G] inthe direc m 


c(G] = e c{(G]e e, €[G] G) ooo e.a{G] 6 


is isomorphic to the full matrix algebra M, (€) and contains all minimal left ideals 


1 


correspo nding to the character X; ‘ oO 


All of group representation theory can be developed starting from the Wedderburn- 
Artin theorem (see Subsection 1) and the general structure theory of group algebras (the full 
story in the case of finite groups is contained in Theorem 4). We have gone in the other 
direction, essentially using Schur’s lemma as our point of departure, 


We conclude by proving a useful fact about the degrees of representations. 


THEOREM 5. The degree n of an irreducible complex representation (@, V) of 
outs ele/Ker a eee ME AELO Ny 


a finite group G _ divides the order ke c 


Proof. Let S be the corresponding representation of the group algebra C[G]. 
By Schur's lemma (Proposition 2 of 83) , the linear operator &(z,) » Which commutes with 
all of the @(g), geG, and hence belongs to eae | (V), must be a multiple of the 


identity operator: ¢ (z,) = w, - We have 


z= aS 4 I pte 
nw, = trw.€ = tre(z) => tro(g.) = |g. I xg(6) ’ 
and hence 
G 
— le; I x, (e) 
1 n 


Applying e to the relations (4), we obtain 


539 


Hence, Zlw,] is a submodule of the Z-module of finitetype Z Cw, ene w] , and 
so, by the results of Subsection 3 of §3, w is an algebraic integer. Using the same 


results, we find that 


ict _ Isl 


es —~- 1G a — 
nn XeeXplg = a DuXq@Xy® =F Dy lel + xg GD xg GD = Do oy xy) 


is an algebraic integer. Thus, IG|/n eZ. el 
4, Non-associative algebras. Let A _ be any (not necessarily associative) algebra 
of arbitrary dimension over a field K. To every triple of elements x,y,zeA we 
associate their associator (x,y,z) = (xy)z - x(yz). Depending on the identities satisfied 
by the associators and other expressions, we obtain various types (called primitive classes 


or varieties) of algebras. Examples of such classes of algebras are: 


1) associative algebras: (x,y,z) = 0; 
2) elastic algebras: (x,y,x) = 0; 
3) alternative algebras: (x,x,y) = (y,x,x) = 0; 
4) Jordan algebras: (x,y, ©) = 0 gil spo oe SS WS 
Of course, we can use this axiomatic procedure endlessly. But what is remarkable 
is that many classes of non-associative algebras have arisen naturally in fields far removed 
from the science of algebra per se, The most notable examples are the Jordan algebras, 
which arose from quantum mechanics, and the Lie algebras, which were originally designed 
to describe (under certain conditions) the local structure of topological groups (Sophus Lie 
was a XIX century mathematician), We have alluded to Lie algebras before in this book, and 
so now will devote a brief discussion to them. 
Ina Lie algebra L overa field K the product of twoelements x and y is 
customarily denoted [xy]. By definition, ina Lie algebra the bilinear map (x,y) » [xy] 


must satisfy the following two properties: 


540 


(i) [xx] = 0 ({xy] = -Lyx], the property of anti-commutativity) ; 
(ii) [Ixy] z] + Ilyz]x] + [[zx]ly] = 0 (the Jacobi identity) . 


Example 1. Let A_ be an associative algebra over a field K. We can give the 
vector space <A_ the structure of a Lie algebra, which we denote L(A), by setting 


[xy] = xy -yx. Obviously, {xx] = 0. Furthermore, 


Ilxy]z] = (xy - yx)z - z(xy - yx) RyA O G/5iees SO sy 4b ys 


tl 


Ilyz]x] = (yz - zy) x - x(yz - zy) WA > WGI SO KYA AP NY 


Wil zexaliy Ras (2 xe cz iv = vz xz) BI > Seay O Wind ae Vy se 9A 


By simply adding these expressions, we obtain the Jacobi identity. 


For example, let A = End, (V) = £(V) be the algebra of all linear operators ona 
finite dimensional K-vector space V. Any homomorphism o froma Lie algebra L to 
the Lie algebra L(£(V)) is called a representation of the Lie algebra L. The represen- 
tation space V is alsocalled an L-module (ora module over the Lie algebra I)s 


Formally, an L-module is given by three axioms: 


(L1) x(@u + Bv) = a@xu + Bxv; 
(EZ) (ax + By)v = axv + Byv; 
(L3) [xy]v = x(yv) - y(xv) 


Example 2. If A_ is any (not necessarily associative) algebra over a field K 5 Ten 
a differentiation of the algebra A we meana differentiation of the ring A (see the 
definition in Subsection 3 of §1 Ch. 6) which commutes with scalar multiplication, ie. , 
O(a) = )0(a) for } © K and ae A. Example 1 and Exercise 8 in 81 Ch. 6 show 
that the formula [a, $5] = o, s, 2 8, %, gives the K-vector space Der(A) a Lie 
algebra structure. For example, if A = K[X] isa polynomial algebra, then Der(A) 


consists of the differentiations a » u€ A, which act according to the rule: 9 (f) = 
u 


= udf/dX = uf'. By definition, wehave: [9 ® J(f)= 8 (® f) - 8 0) = 
u'v uv atl 


541 


= fas) = out’) seh (i eat fo byt ay (at out) = (ie! = u'v)f'. 


Consequently, [x98 j= os 


= eee Oe and we see that the algebra Der(A) is isomorphic 


to the infinite dimensional Lie algebra (A,[ ]) having A as its underlying space and 


multiplication law [uv]= uv'-u'v. Setting Ves = oe). » we obtain a direct sum 
decomposition of A 
i = ®S A eA @ A iss) 
(el) (0) (1) (2) : 
which shows that A isa graded Lie algebra: [A,..A,.]C A,, . compare with 
. g Ge ey eae 


Example 2 in Subsection 1). The Lie algebra (A,[ ]) acts onthe vector space A 
intwo ways: 1) (a,f) af’ (the natural action); 2) (a,f) b af' - a'f (the action 
by adjoint endomorphisms). These two actions give two non-isomorphic (A,[ ])-modules. 


2 


Example 3. The skew-hermitian matrices Ky i K, ; K, with non-zero trace that 


were constructed in Exercise 3 of §1 Ch. 7 forthe group SU(2), satisfy the relations 


ee) ae US oe 


which are exactly the rules for the cross-product of vectors in R? (LK. KJ = K, K. 2 K. K 
is the commutator of the matrices in M, (C); see Example 1). Hence, the three-dimen- 


sional real space (Ky 5 K, 5 K, ae has been given a Lie algebra structure. 


The general theory of representations of compact groups tells us that there is a one- 
to-one correspondence between the irreducible representations of SU(2) and those of its 
Lie algebra su(2) = CK, 1K, K, IR . Intuitively, we can see this by taking into account 


1 
the continuity of the group representation and considering the linear operator lim = ci) (g.) 
t-0 
in the linear span of the operators ©@ (8) (where 8, is anelement of SU(2) which 
depends in a differentiable way on t and for which 8 = e); this limit operator is actually 
inthe algebra su(2). In order to see that the list of irreducible representations of SU(2) 


in §6 Ch. 8 is complete, we must verify that, for any naturalnumber n, there is up to 


isomorphism exactly one irreducible su(2)-module of dimension n over @. Todo 


542 


this, it is useful from the very beginning to pass from the real Lie algebra su(2) to its 


“complexification", which is the Lie algebra 


L = sl(2) = su(2) ® Ro 


of all complex 2X2 matrices having zerotrace. The basis elements 


S = = Bi lk e, = ~ik, - K 


e = -ik) + K 0 39 l 2 


Sil a 


of the Lie algebra L multiply together according to the rule 


[e l= €q> [e,e_,] = -2e [ey e,] = 2e) 2 (8) 


eal a1? 


Forgetting fora moment how L originated, we may take L = ¢e to be the 


Ege ae 
abstract three-dimensional Lie algebra over (@ with multiplication table (8). It is easy 
to verify that L is a simple Lie algebra. Hence, any irreducible L-module of 
dimension > 1 is faithful. 

First suppose that V #0 is anarbitrary L-module of finite dimension over € ; 
and let Ey 9 Eo» Ey be the linear operators on V_ corresponding to ©1782 &y> 
respectively. The representation theory of Lie algebras has its own terminology, which we 
shall adhere to. If V* = {ve VIE,v =v} is the eigen-subspace of E, in V_ with 


eigen-value 4 € @, then the vectors in ve are customarily called vectors of weight }. 


The dimension of vi is called the multiplicity of the weight i. 


LEMMA 1. If ve V', then Eve Vt? ana Bee y aoe 
Proof. By the axiom (L3) we have 
Ey (E,¥) = [Ep E, lv + E, (Ev) a 2E,V + E, Qv) = (A + 2) 1B ; 


reat 2 = 
so that, by definition, E,v € a . We similarly show that EY € Vv 5 ° a) 


We know from linear algebra that vectors corresponding to different eigen-values are 


linearly independent. Hence the sum W = De ve © V isa direct sum. It further follows 
iN 


543 


from Lemma 1 that W = DS vs isan L-submodule of V. Since W #0, we must 
v 


have W= V if V_ is anirreducible L-module. 
A vector Yo e Vis called a highest weight vector of weight ) if Y # 0 and 


BW = Og Eq%g = A> 


LEMMA 2. Any finite-dimensional L-module V_ hasa highest weight vector. 


Proof. Take an arbitrary non-zero vector v ofweight yp, and construct the 


Tey, WH ae 


sequence of vectors v, 12 EL, Voeee 


with weights w,pt+2,p+4,... (see 


Lemma 1). Since dim V < ©, it follows that aay, = 0 forsome m. If we take 
m to be the least integer with this property, we can set v, = Ey AX=pt2m. (s) 


0 MY 


# 
As an example, we consider the (n+ 1)-dimensional (-vector space ve with 


fixed basis Vor Ypres Vee We define the operators Eli ; Eo and EL by the 
formulas 
Jae = ey cara ; 
EQ ee (og 2m)v 3 (9) 
Em = (n -m+ 1) ne} ; 
where we set Wee OF= en . A direct computation shows that we have: 
E (EL) se) = E_ (Fy ce) = Eo ae 
Eg(E_) ee) = E_, (Ev) = -2E, ae 


E (EF, 7) = E, (Ey v,) = B24 5 


which corresponds to the multiplication table (8) and the axioms of an L-module. Since 
E.% = (n + Dees = 0 and EQ % =1Vogs it follows that Yo is a highest weight vector 


of weight n, and the entire space ae can be written as the direct sum 


544 
V, = ie eee ee (10) 


of one-dimensional weight subspaces oe = ee? (each weight has multiplicity 1). 
If we had a non-zero submodule U_ in ve , we could take any eigen-vector ue U of 


Ey . Then, by (10), u = a for some m. By successively applying EL (see (9)), 


e U,...,v, € U, and by successively applying Eu to Yo 


we would obtain v 0 


m-1 
we would obtain all of the other y a iience,  WG— ve » and a is an irreducible 
L-module. 


We note that Vo is the trivial (one-dimensional) module, and Vo is the module 


corresponding to the natural definition of the Lie algebra L: in the basis iG 5 ae the 


operators Eu ; Eq: EY have the matrices 


The following theorem answers the remaining question before us. 
THEOREM 6, Every (n+ 1)-dimensional irreducible L-module V over € 


is isomorphic to us 6 


Proof. By Lemma 2, our module V_ has some highest weight vector v. of 


0 
ace he eee = Oe) eee ls aie 
; -l m m! -l10 mm! - ( ~ {+++ (E_) %q)- ++) 


for m > 0. We claim that the following formulas hold for any m > 0: 


Ei ’m = (Goat 45 eae : 
EO %m = (| ¢ ee ; (10') 
Bow = OG @ ta + sy 

m 


= 


In fact, for m= 0 the formulas (10') reduce to the definitions of the highest weight vector 


Yo and the vector vy . We now use induction on m: a) the vector v 44 is defined 
m 


by the formula E_ ee = (m+1)v 


1 b) the formula EO’ mn =(,- 2m)v follows 


m+]? 


from Lemmal; c) if we already know that E_v = (A -m+2)v then we 
med m-2’ 


545 


obtain the last formula in (10') by dividing both sides of the following equality by m: 


Wt 


He = Be) = (EB, eee ds 


1’m-1 


I 
I 


E cee SaIT ta 2) Een, 


9 la tee? tia) 


" 


Cra) ae nC eae hae 


ll Sal : 


If the vectors v,,v_,..., V_ arenon-zero fora certain r, then they must be 
Oca eae: e 
linearly independent, since they have different weights. On the other hand, since V_ is 
irreducible, the submodule generated by the vector Yo is allof V, and, since 
dim V = n+1, itfollows that V = (Vos Vyreee ; v and Nee ey a 0. 
In particular, 


OS: 1 e(cnry. = == = ia 
n n 


(note the interesting fact that dim V < = implies that 4 isa non-negative integer), 
Substituting }\ =n in (10') and taking into account the notation we are using, we 


arrive at the formulas (9) which define the L-module ve . Hence, V= ve Fe (3) 


EXERCISES 


2 : ; 
1. How many solutions does the equation x + 1= 0 have in the quaternion 


algebra H? 


2. The algebra of generalized quaternions over @Q. Show that the multiplication 


table 


546 


with nyme Z, m and n non-zero, gives the structure of an associative algebra with 


unit to the four-dimensional vector space H(n, m) = (1, €19&59 €4) over Q@. Todo 


Q 


this, use the representation 


mS Ky +X) €) + Xp ey +X €g — AL = . 


2 2 D 2 
The determinant det AY =X 7 x) n- Xy m+ X. nm = N(x) is called the norm of the 
element x. Prove that H(n,m) isa division algebra provided that the norm of any 
non-zero x € H(n,m) is non-zero, Using the ideas and results of Exercise 7 We 6 


show that, if p is aprime congruent to +3 (mod 8), then H(2 »P) is a division 


algebra. 


3. Consider Te as an n-dimensional vector space V_ over F,- Along 
2 


with the addition operation coming from IF ae introduce a multiplication operation on V 
2 


by letting (x,y) ® xoy = yxy. Here xb A= is the automorphism of F 5 which 
2 


Pe 2 aos 

is inverseto xx ; thus, mary = Vx ap Vy. Show that (V,+,0°) is a commutative, 
non-associative algebra over I, with the properties: a) V has no zero divisors and 
does not have a unit element; b) the equation ao x-=b with a # 0 has a unique 


solution; c) the automorphism group Aut(V) acts transitively on V\{0}. 


547 


4, Show by a direct computation that the following identity holds in any algebra: 


(ek 5 Wig o4) Sr jos, Wee = (eee Wa A) > (len sei, Fa) te (Ch 5 ye) 


Prove that, ifan algebra A with unit element 1 overa field K has the property 


that (x,y,z) ¢K-*1 forallassociators (x,y,z), then A is an associative algebra, 


Appendix. The Jordan Normal Form 
of a Matrix 


We are including here a discussion of this particular corner of linear algebra in order 
to emphasize its similarity with §5 Ch, 7, where we have the classification of finite abelian 
groups. We decided not to insist in 83 Ch, 9 on the unified point of view of modules over 
principal ideal domains, since different readers may find it more convenient to have direct 


proofs of the necessary facts concerning groups or concerning linear operators. 


1, If we want to understand the action of a given linear operator @: V~ V g te TS 
natural for us to try to find a basis of V which is most compatible with @. In other words, 
in the class of similar matrices Gta C corresponding to the operator @, we would like 
to find a matrix with the simplest possible form. Solving this problem depends on the nature 
ot the field K over which the vector space V _ is defined. In what follows we shall assume 


that K is the complex number field @ or any other algebraically closed field, 


Let n= dimV, and let hyo cody do be the roots of the characteristic polynomial 


549 


mm n 
ap Be eee ee Tl (t-A) 


£00) = f(t) = laces, = vs) = te 
l= 


BS a ae 


n n 
Bo SG cay = 16 Wed ee 


The complex numbers i are also the eigen-values of the linear operator @: the sub- 
spaces 


Me 
v’ = {ve viav = ai 


are non-zero, and the non-zero vectors in these subspaces are called the eigen-vectors of 

G. The set Spec(G) of all pair-wise distinct eigen-values (characteristic roots) of the 

operator @ is called its spectrum. We similarly speak of the spectrum Spec (A) of the 
i 


matrix A. 


We note the following facts. 


(i) Eigen-vectors having different eigen-values are linearly independent, The sum 


v is a direct sum (in general, 2 v does not necessarily coincide with V). 
x € Spec (A) 


(ii) The matrix of a linear operator @ can always be reduced to triangular form 
(within the class of similar matrices). 

The simplest way to see these facts is to use induction, Take a one-dimensional 
G-invariant subspace Ce) (where Ge, = xy e)s pass to the quotient space 


Ve=v/ ¢e,) ={v=svt (e,) |v eV}, which has dimension n-1, and to the quotient 


operator é. defined by Gv = Gv. In V_ choose a basis Corres ay which brings 


A to triangular form (using the induction assumption), and then returnto V_ to obtain: 


550 


(iii) (The Hamilton-Cayley theorem). A linear operator (q and the corresponding 
matrix A (in any basis) are annihilated by their characteristic polynomial. 

Since this assertion does not depend on the choice of basis, to prove it it is useful to 
make use of property (ii). We consider the chain of G-invariant subspaces V = Vo =) vo = 


Sono SW SUS Wee W = Ce, lay eee Maran cere Since (@-A, bey € 


n-l k 
ea , it follows that (G eee ee Viney » and hence 
n 
f (AV = a (G “KEV = (¢ 4,8) goo eG CG 4, coo (G Beem as Cc 
Se (G -4,€) 00 (G- 428) V_ Faso S (@-A,e)V_, = WW ¢ 


But f£,(G@)V = 0 if and only if ie. 6 


(iv) The minimal polynomial i ((o) = hy {t) of an operator (i.e. , the monic 


polynomial of minimal degree m <n which annihilates G and A) isa divisor of the 

characteristic polynomial f. (t) and is divisible by all linear factors t-, A € Spec(G). 
The division algorithm, which gives f. (Ge) = toy) 2 (t) + r(t), 

deg r(t) < deg he (t) , along with the fact that t. @)sOs he (G), allow us to conclude that 

r(@)=0, andso r(t)= 0. Next, if } is an eigen-value of G@, then, choosing v_ so 


that Gv =Av, wehave: 0 = h (Gv S he Q)y » so that h(a) = 0, and hence 


bi 


(t - i) h(t). oO 


Example. A linear operator G:V- V_ is called nilpotent if ae = 0 for some 


positive integer mj; m_ is called the index of nilpotence if aon #0. Weclaim: if 


eaten E m-I 
G Va 07 then the vectors vy) vg C v are linearly independent, In fact, any 


non-trivial linear dependence relation has the form 
k k+1 
Gv+@G NOUR Sea aa ate ee v= 0,  — ie = pao il 


: m-k-1 = 
Applying the operator @ to both sides of this equation gives us oe ly = 0 


551 


contradicts the choice of v. 
Thus, the index of nilpotence m of @ does not exceed n= dim V. Suppose 


iol 
that m-=n, andthat G@ ov #0. We introduce the following notation for the basis 


vectors: v ey Vv ya 


1 ; 2 ity corenoly for 


Vv Gv, v. =v. Then Gv, = 
n k 


n-1 Yk-1 


k> 1, and Gv, =0, so that the matrix of @ in the basis 1 eee ol is the so- 


called "Jordan cell" 


ORI O: 5 oO © 
iO @ W ase @ 
ee Oe 
ly geen | eeea igs Snes : 
O © C no, @ 2 
OU ao 04 

2 n-l 


Bomexamples it | \Vi= Gla Xerxes... Xs e is the space of polynomials of 
degree <n over @, andif @ = d/dX is the differentiation operator, then the matrix 


of G inthebasis {v,}, where v. eae is precisely J .. 
i fh ~~ ity n,O 


More generally, by the mxXm_ (upper) Jordan cell corresponding to the eigen-value 


4, Wwe mean the matrix 


® i Wh oy 0 0 

oO ON ; 0 O 

Sloe 4a oO 
Jind 7 ah Base ‘ 

ORO 0 a iy ll 

Oo © OW gas WA 


We note that J -LE=J is a nilpotent matrix with index of nilpotence m. We 
m,A m,0 


further see that (t - me is the minimal polynomial of the Jordan cell lee w? and 4 is 
? 


the only eigen-value: Spec (J » = {x} : 
m, 
If u(t) is any polynomial, then we have 


(m-1) 


muy) ial ONY CU) aoe Oey es =)! 
0 u(a) wey! oa (4)/(m ~ 2)! 
Oe z NE 3, ee | 


) 0 0) ud) 


552 


so that it is much easier to operate with ie than with arbitrary matrices. 
} 


FUNDA MENTAL THEOREM. Eve n Xn matrix A overan algebraically 


closed field K (for example, over () is similar to a direct sum of Jordan cells. That 
is, there exists a non-singular matrix C_ such that 


; 


(this is called the Jordan normal form J(A) ofthe matrix A). The Jordan normal form is 
unique except for the order of the Jordan cells. 


The fundamental theorem is proved in three steps in Subsections 2 ,3, and 4, 


J 


m_i,A 
s 


Since the minimal polynomials of similar matrices are the same, it follows from the 
fundamental theorem and the above remarks on Jordan cells that 


m, m, 
1 it 


a 5 1 _ p 
h(t) = (t rh) cae (Esa) : 
1 p 


where {i, Sad A ri } = Spec (A) » and By is the maximal order of the Jordan cells 
1 p k 


corresponding to the eigen-value }. 
k 


Clearly, a matrix A is diagonalizable (i. e. , Similar to a matrix of the form 
diag{X, , cou 4 1) if and only if there are no Jordan cells of order greater than 1 in 


J(A). Hence, we have the following useful criterion, 


COROLLARY, A square matrix A over € is diagonalizable if and only if its 
minimal polynomial wh, (t) has no multiple roots, 


553 


Note that to apply this coroNary, i.e. , to find h, (t), wedo not have to reduce A 


to Jordan normal form. 


2. The set of vectors 
k 
va) = {ve vV/@- 28) v = 0 forsome k} 


is called the root space corresponding to the eigen-value } € Spec(G). It is easy to check 
that V(\) is really a subspace. Namely, suppose that u,v ¢ V(A) with @- re)>u =0 


and (@-€)v = 0. If m=max{s,t}, then 
(¢ -1€)™ (au + Rv) = a -xe)"u + BG-1e) v=0, 


andso @u+f8ve V(A) for any @,B eC. Since V(A) contains an eigen-value 


Xe va), but 


corresponding to }, it follows that vV(a) # 0. Furthermore, we have V 
the two spaces do not necessarily coincide, as we see from the example of a nilpotent 
operator G having index of nilpotence n. Inthat case A = 0 is the only eigen-value, 
and dim i = jl, lots WW) = We 


Since dim V(,) < n andthe restrictionof G@- AE to V(A) is a nilpotent 


operator, it follows that 


va) = {ve V/G@ - 18)'v = 0} 


THEOREM 1. Let @:V-—- V_ bea linear operator with characteristic polynomial 
fi) = 1 C=) (hep n, 2 fOr ize os 
G ; i i j 
i=] 
Then V= VQ.) e...8@ VQ) is the direct sum of the root spaces VQ), each of which 
is invariant with respect to @ and satisfies dim VQ) = lee The operator @- A. ; 


which is nilpotent on vO) , is non-singular on the subspace 
= 20.0 E ; ao0 Vv 
V; VQ) cr) cr) VO.) 8 VAs 1) cr) @ 0.) 


Finally, he is the only eigen-value of the operator CG vO)’ 
i 


4954 
a 


Proof, None of the prime factors t - he can be a divisor simultaneously of all of 


the polynomials 


ne 
f(t) = 0 (CS fee 
j#i 


and hence we have g.c.d. (fi (acces a (t)) = 1. Thus, we can find polynomials 
g,), 500 4 g(t) e @[t] for which 

p 

Gs Oe (1) 


ie 1 
The subspaces 
Bet (G) 2 (GI at (2) a a) | vee cee cect 
are invariant under @: 
aw, = f.(G) g,(@)av eS f.(G) g,(@)V = We 


In addition, 


n, 
1 
(a - r,&) we = f(a) g,()V = 


(since Le (G) = 0 by the Ha milton-Cayley theorem), so that 
w,¢ VQ,) ‘ (2) 
The relation (1), re-written in the form 
p 
e= D) £@s@ , 
ell 
gives us: 
p 
V= aE, W, 


and so all the more (because of the inclusion (2)): 


555 


Suppose that v ¢€ VA.) al e , Where, as in the statement of the theorem, Vv, = 2 V(,). 


ike! 


Then (G - ey = 0, andsince v= 2 " and (- a; Bye i = 0, it follows that we 
j#i 


also have ole (Ge ; e) fy = 0. But, because the polynomials (t - Oe and 
ies 


Cl(G)e— a Ge = J are relatively prime, there exist a(t) and b(t) for which 
jee 
a(t) (t - a + b(t) c(t) = 1 
We obtain 


v=aG@)@-1j)vt+b@{ Mm G@-r1e) We=o , 
: Te : 


i.e. , the spaces V(A,) and Me have intersection zero. Thus, we have the direct sum 


decomposition ra 
Ve= VO) ®B...® we) (3) 


into G-invariant subspaces. 
The inclusion (2) and the decomposition (3) immediately imply that WwW. = VQa,). 


We have thereby obtained the following explicit expression for VQ) G 
Va.) = f.(G) g.(G)V ; 


where f(t) and g, (t) are the polynomials in (1). In particular, we have 


af 
(G - d,) va.) =O , 


n, 
The minimal polynomial for @ on VQ,) must be a divisor of the polynomial (t - A.) i 


This implies, first of all, that oF is the only eigen-value for the operator ¢@ Vier 
i 


Furthermore, if we take a basis for V which is a union of bases for the vQ,) , then the 


operator G has matrix 


556 


where A. isan n! Xn’ matrix (with n = dim V(A,)) whose only eigen-value is } 
i i i 


n! 
i ; . 
and whose characteristic polynomial is f, (i) = de = ri) , where ni < mn . Since 


p 
= it foll that nm=n +--+ +n' and ni=n,. 
f(t) ae it follows tha 1 i ‘ 


It remains to prove that the restriction (q - 1, ©) is non-singular. But this is 


We 

i 
clear: otherwise we would have {Ker(¢ - r,E)} al V; #0 and Gv- iv = 0 for some 
non-zero v ¢€ ve. . But the characteristic polynomial for @ on Vv is fi @) = 

n, 

= ig Gear) J , and }, cannot be an eigen-value., oO 

j#i ! : 

3. Theorem 1 reduces the problem of choosing the simplest possible matrix for 

G@:V- V_ tothe case when @_ has only one eigen-value }, and (G - Vey = 0, 
m<dimv. Ifweset ®@ =G@-ie » Wwe obtain a nilpotent operator with index of nilpotence 


m and with matrix B. 


THEOREM 2, The Jordan normal form J(B) exists for the nilpotent matrix B 
ee ee ee 


(here the ground field K canbe arbitra ry). 


Proof. We must show that the vector space V_ on which the nilpotent operator BR 


acts with matrix B splits into a direct sum of so-called cyclic subspaces K[g}v. = 
i 


inst.) m, 
i : i : ; 
= Gc 7 By, 660 2 JE a with & ee 0. We would like to use induction on the 


dimension of the space. Suppose that the theorem holds forall paire (Vv, B'), where 
dim V' < dim V and &' isa nilpotent operatoron V', 
jena 


m 
Suppose that 8B = 0, RB u #0. We introduce the cyclic subspace 


m-1 = 
US Cu iB ieee eee u) and the quotient space V = V/U, and we define the quotient 


557 


operator & on V_ inthe usual way: Rv =v. Here v= v+U_ is the coset with 


: é 00S m ; ai ; 
vepresentative v. Since ® v=8 v=0, it followsthat ® isa nilpotent operator 


with index of nilpotence m < m. In other words, Gy dq U_ while BV cu. 


Since dim V < dim V , by the induction assumption we have 


VS eee Ul: UT Klalu, . 


We obtain the decomposition of V 


V0 cree Ul eu, (4) 


where 


ie (u,,@8U,,...,8 


rd m, 
f i ; : i 
The subspaces U, arenot ff-invariant, since, in general, ®& u. #0. 
i 


For convenience, for fixed i weset w=u,, £=m,, W=U,= 


£-1 
Ww 


= (w,fw,...,8 >. By assumption, 


4 k k+1 m-l 
: eee a a 0 
8 Ww a 8 cheng E UW) ar + m1" u, ae 
: : m-1-k : 
(if ail the S are zero, we have nothing left to do). Applying the operator BR to this 
mils mil m ee : 
equality, we obtain RB" : Pe = a8” u#0. Since @ =0, this is only possible 


if 4<k< m-l. Setting 


k-£ 
ab li " oe m-1 : 


4-1 ; 
we findthat @ v=@ wtu #0, but 


& k misd 
= 2 5 666 = @ =; 
RV=R8 Ww Seu a -1® u 


4-1 f 4 : 
The cyclic space (v,®v,...,8 v) with @v=0, together with U, 


generates the subspace U; eu. 


This argumentholds for any i, 1<i<s-1, so that in (4) we can replace 


each subspace U, by V, = (v.,8v,, coo 4 IE v,? » ® v, = 0. Further setting 


vo=u, m_ =m, and 1 = U, we obtain the decomposition 
s s 


Vo Oe ee 


which has all the required properties. aa 


4, We now prove uniqueness, At the same time we give a practical method for 


reducing an arbitrary nh Xn matrix A_ to Jordan normal form. 


To do this we must be able to find the number N(m ,4) of Jordan cells ie , of 
9% 
order m corresponding to the eigen-value } of A. As usual, we let the matrix A 


correspond to an operator @ which acts onan n-dimensional vector Space V. We 


decompose V_ into the direct sum 
Vivi >) 


where 
S =i 


m, 
OE GRE) Oey) = yee) 
jor ? } : MAY 


We shall compute the rank cS rank(A -} By of the matrix (A - } B) » Or, equivalently, 
F 5 t 
the dimension of the space (Gg - MEY Wo Ol course, this dimension does not depend on the 
choice of basis in V. Each of the spaces in (5) is invariant relative to (G - rey ; hence 
. E é c 
dim @ -2€) V = )~ dim@- e)'¢ lalv, + dim (¢ - xe)" v' 
To be definite, suppose that BS We in then 
= i- 


oh Se = eS 


t 
(G - ,&) elgiv =o. For m,>t we have 


m,-1 
(@- ae) CLaly, 2 ((@- he) v,, Coe ee mG site) vy) 


? 


a t 
sothat dim(G - 28) ¢ [alv, = a -t. The operator G-38 is non-singular on V' 


(by Theorem 1), sothat dim (@-2)'V' = dimV'. We obtain 


559 


t 
iogh, 2 ie 
j 
so that 
ie a Ss (Gem, =e) = De Gin Seo 1b) = 
ing, 2" IE fee Cae 
ert ae Sos 
ms = t ed m,>t+1 
= ye lL = N(t+1,aA)+N(t4+2,A)+°°° 
m,=t+1 m, >t+1 
j j 
Hence, ee eC ee Cs) NC 


See Nii), A) = } = N(m, i), and we finally obtain the formula 


N(m, A) = a ae ar + ee (6) 


PS Pome Aca one ia, 0h F 


We note that r is an invariant of the matrix A (i,e., a number that depends 
only on the class of matrices similar to A). Hence, the uniqueness of the Jordan normal 


form J(A) is also established by the formula (6). 


So far we have said nothing about the matrix C which realizes the reduction 
Al 
(Gy =i avG 


But since we now know the matrices A and J(A), wecanfind C= (c.) from the 


2 
homogeneous system of n linear equations 
CJ(A) - AC = 0 Q 


Let C C be a fundamental system of solutions. In general, not all of the oF are 


Tee - 


non-singular matrices, but, since the Jordan normal form J(A) exists, it follows that 


560 


det (ty Cc, eas ee Cc.) # O with indeterminate coefficients Tiseeey ty and it is 
possible to choose Aree OE C for which det (a, Cc, + ecet a ce) 7 0. Then 
C= a, Cy ne DOG ae ee Cc. can serve as the change of basis matrix which reduces A _ to 
J(A). Ofcourse, C_ is by no means uniquely determined, even if we normalize by 


requiring that detC = 1. 


Hints to the Exercises 


29 


Ga 


For any natural number n, the number 4n!- 1 has at least one prime 


divisor p ofthe form 4k-1, where p>n. 


Set n= 2 and m= P) Po vee Pe » where Phocers De are different 

primes of the form Ps =4 k +1. Then every prime divisor p of the odd 
2 2 

number n° +m has the form 4k +1, where p does not belong to the 


set {Py > Posee bs pote 


In case of difficulty, the reader canturnto §4 of Chapter 4, where this is 


proved using more sophisticated considerations. 


Let dim Vie (GA) ae dim V, (A) = gs, Choose r rows which form a basis; 
without loss of generality we may assume that they are the first r rows 


A,,A,,e+-, A_-+ Consider the shortened rx matrix 
cane? eee: 


Aatne A ] made up ofthe first r rows of A. Choose t 
£ 


pd 


columns of A which forma basis, where t = dim Le (A). Without loss of 
; Ol) ~(t) d os r 
generality we may assume they are Re cao oY 6 Miles Me (AJC R , 


we have t <r. Next, prove that s <t as follows, For every column 


k 
, k >t, wemust find scalars } a ye 


wor? d, e¢ IR such that 


t 
1 @)  . i 
= } ue e+ AA ) Gag Baa = > hp ip? 1<i< m. Choose 


P=! 


CHO Fh i so that ee = Ay (ve teee + oe A : in the shortened matrix. 


Then a,, = Veal) for t= 1 (Ror san use ile expression 


Ae iene tne + Ae Tor the i-th row as a linear combination of the 
i al ie ie : 


first xr rows. We then have: 


104 


104 


105 


562 


r ie t i fg 
eS ba = LB x a = ny Dome = 
Us 2, Haken pa bd Mee 225s 2 £ 4p 
t 
= >» Apap’ Thus, s<t, and, since t<1r, wehave s<r, Next, 
p=l 


ci ee 
tes le toe top ee 
rie on ooe a om 


which is the nym matrix whose rows are the columns of A and whose 

t 
columns are the rows of A. We have: 1 CA) = r (A) ; ri A) = r (A) ; 
and so, applying the inequality we proved to cs » we obtain r<s. Thus, 


r= 58. 


Notice that, if a matrix B has basis columns with indices i, 9800 4 de , then 
all of the columns in AB canbe expressed as a linear combination of the 


i) 


columns (AB , k= i, poe y ie . The same can be said for the transpose 


matrix en B) = B a . 


Consider the restriction © = Oly of the map to an arbitrary subspace 

Vo R™., tis obvious that Kero c Kerg. Hence, by Theorem 1, which is 
applicable, because we already know that V_ can be interpreted as Rs for 
some k<m, wehave: dimV- rank © = dim Ker < dim Kero. Thus, 
dim V - dim (V) < dim Ker, Setting V = ¥(R") = Img, we finally obtain: 
dim Ker gy = n- rank py = (n - rank y) + (dim V - rank ©) < dim Ker y + 


+ dim Ker. 


Show that A = [Xp yeee, XV ees Yee 


563 


Page 
129 3, Usethe equality det C = det C' and relations (4) and (5). 
130 6. Recall Example 3 of Subsection 3 §3 Ch, 2, and take note of the fact that 
eee, (lye, i) = (- 1)" det e Cilvecey 0) 
137 5, ‘If this is not the case, use the criterion in Exercise 3. Namely, if 
fo) ) 


[x, pene X ] is a non-trivial solution of the linear system AX = 0 and if 


© » , F 
x, isa component with maximal absolute value, thenthe k-th equation 


aX. + a a = 0 gives us the estimate (n - Dla | |x? | = 
Keke A Oeics a 
Oo fe) eee ae 
= (in = 1) a os, |) << (are 1) la | |x | , which is a contradiction, 
rae kj j kkk 
137 6, = Since ci. See By ,» where C = (c,,) , repeated application of the rule for 


expanding a determinant along a row (Theorem l' of §2 Ch, 3) gives 


1k, 1k, 1k. 
ml ve "2k eee "2K 
Cea SS | eae ie 
ey kee ae 
a a eee a 
nk, nk, nk, | 


where the summation is over all pair-wise distinct ky 7 ooG y kK - £fm<n7, 
there are no such indices, andhence det C= 0. Butif m>n, then 
Kiseeey KE is a choice of numbers ipeeodad mee Wa Ba oag q wile 


taken in some order. Then collect all terms corresponding to a fixed set 


ALI Prana Jot , and, using the complete expansion of a determinant, obtain 


Page 


182 


205 


206 


Whe 


564 


1k nk, i, nj, 
= ao all o Bee is) S 
eae were oe an 
i n T n 
a Seg a seve fa 
1k nk 14, nt 
Ao. eoo Gh, 16). gaa Je), 
a] nj) i) ue eco 
= . where 7 = é 
ha . e o o ° . ° °° ° . e ° o ° 3 
ky vee ke 
a. ooo b, coe 1 
1h ne i} in 


-l 
Consider the partition of G intopairs {g,g }. 


Use the Fundamental Theorem of Arithmetic in §8 Ch. 1. Does this group have 


any finite set of generators? 


Find an upper estimate for the number of different Cayley tables of order n. 
Use Theorem 2 to show that p(n) is at most equal to the number ©) of 
different subsets of n elements in sa - Actually, p(n) is much less, but 
ho good estimate, i,e., close to a best possible estimate, has been found for 


p(n). 


Partition G_ first into left and then into right cosets with the same 
representatives. 


2 - - 
If x =e forall xeG, then ebab = ahah ae 


a Sil 
=b(b -) (a ') a = beea = ba. 
Use induction on m together with the fact that the binomial coefficient i is 


divisible by p if O<k <p. 


Use proof by contradiction, Let N = {a, OR ea be the set of all non-zero 
non-invertible elements, The map p, a. > Xa, is a bijection N-N _ for 


any non-zero xé R\N. ‘The kernellofthe map p.x > p,, is infinite, 


Page 
BPM 6. 
SIS) 4, 
DSO FI. 
AG 5 
277 =A, 
mS Mal, 


565 


Answer. The identity automorphism and the map a + bd E> el oD bd. 


After proving that 
m 
~ ee 2 Gos 
i ) 
= k \ m 
use induction on m. 


Suppose the contrary: Bleseoy) = (8 Conc 4 Maes 20+), (e0- Beng Goode snares 


1 ij ij 

det (9) is a linear homogeneous polynomial in the variables in a fixed column, 

it follows that one of the 81> 8 is a linear homogeneous polynomial in the 

X., I <a << in 
gy 2 - = 


Xi . Apply the same argument to the variables in a fixed row. Now suppose, for 


, fora fixed j, while the other factor does not involve those 


example, that X11, 4ppears in B)° Then 85 does not contain ae 


ieee and then 8, does not contain x,,, l1<i,j<n, iLe., 8, 


Y 


? 


is a constant. 
Answer. No. 


Since f is homogeneous, clearly f£(0,..., 0) = 0. Suppose the theorem is 
false, ie., (a, 666 5 a) (On coe, OC) = f(a,, Gan 9 a.) 7 0. Using 


Exercise 3 and Fermat's Little Theorem, show that the reduced polynomial for 


p-l 2 
B(X eee 6 x) ei- £(X , ees x) must then be BX (KX yo ; x) = 


—(p- xP Soe! 
= (1 xy Dean Wl x ). But 

den oes (ose idea t= Wr pe) ne der ae 
This contradiction proves the theorem. 


Apply the corollary to Gauss's Lemma, in §3 Ch, 5, and the previous exercise, 


and use the fact that Z(X] is a unique factorization domain. 


Page 


296 


317 


318 


318 


318 


318 


318 


566 


i 


Consider f asa polynomial in X, ee coefficients in KLX, peoey X 


Notice that, since f is skew-symmetric, f= 0 for x, =X and 


hence, f is divisible by xX = Xuel . 


For x > 1 start with the inequality 


e out mah yomtl ae 
f(x) > a) x il 2 areerereer eee ck eerie la) (x - 1)-B] 5 


Differentiate the formal expression f(X) = De b (X 2 a k times and set 


“<= @ 
Use Exercise 3, 


It is useful to consider the "duat" polynomial X°£(1/X) and use formulas (12) 


of §1 and (9) of §2. 


Use the fact that f(X) = (X -c) g(X) = g(X) e Z[X]. Now use these facts to 


find the integer roots of the polynomial x4 + x? e x? + 40X - 100 (answer: 


; ae n 
If c= a/b isa fraction in lowest terms, then a /b = -a.a oe le) 


mee ~ 2 ee 
n 


Using Theorem 1, factor f(X) into polynomials of the form (X + a) + bo 


and then apply the formal identity 
2 2 2 2 2 2 
(Pp +q)(r +s) = (pr+qs) + (ps-aqr) , 
which comes from the relation 


[eo eet oe eigie se 


341 


341 


342 


365 


366 


Wi, 300) = 5 ge a bxX +c = (x? + @X +B) (X + 6), where a=aQ@+6, 
b=8+06, c= 80, with @,8,@c¢ R. Stability of £(X) is equivalent 
to stability of the two polynomials x? +OX+ 8, X+6, i.e., tothe 
inequalities @>0, B>0, @> 0. It is easy to check that this system of 
inequalities is equivalent to: a>O, b>0, c>0, ab-c>O. Similar 
considerations apply to a fourth degree polynomial in REX]. 

I, Take H_ to be the stationary subgroup G, ofa point l1eQ, use (3), and 
set o (i) eae 

3. Notice that all elements of P have the form 

EEL if at @ i Oo) @) i @ a 
g = A BIC‘, where Av= |0 10], B= flo iil, c = foi of ; 
Om Om 00 1 io) 0) al 
if g ¢ Z(P), then Cy(g) = (g) Z(G), [C,(@)| =p. 

At ces) and 7 = Tree Ts then omc = of oO ane om oO ; in 
addition, o* (iy Lee i) . =o =(¢ (i) afi.) aor o (i) for any cycle 
(i, i, 06 i) of length k. 

8. In De N(g), each element x € Q is counted Ist (x)| times. Thus, the 
elements in the same orbit as x give a contribution to the sum equal to 
(G:St(x)) + [stg] = Ic]. 

11, Count the number of elements of order 2, or make use of the result in Exercise 
OR 

14. aba= fee = neo = a = aba: As = bab ‘ ei = ees ‘aba = 


567 


= ba . Conclude from this that ab = ba, and so, if we take into account the 


other relations, we have b=e. 


384 


412 


420 


420 


420 


420 


432 


568 


The beginning of the proof is the same. After takinga cyclic group (a) of 
maximal order m in A, takea subgroup B_ such that Ga) x Bis 
maximal. If this direct product is A, then we are done, Otherwise, consider 
anelement ce A notin (a) xB but suchthat cP ¢ (a) x B (where p 
is a prime). Now work inthe group (c, (a) X B), and try to write it in the 
form (a) xB’, where B'>B. 
it iat 


Differentiate the equation e = 1 with respect to t and then set 


t= 0, 
See the proof of Theorem 5 in §3 Ch. 7. 


Using the fact that all elements of order 2 are conjugate, show that they forma 
"bouquet" (Fig. 21) of five pair-wise disjoint (except for the identity e) 
conjugate Sylow subgroups of order 4. The group I acts on the “bouquet” 

by conjugation. This action is faithful, since I is a simple group (see 


Exercise 1). 
Apply the homomorphism theorem to ® :SU(2) > SO(3). 


Use the ideas in the computation of the number of necklaces (Problem 2 at the 


beginning of the chapter). 


Rewrite (4) and (5) in the form 
(rg Wy 5 
nigee 
©, Xs (e) 


-1 -1 
Ic oy HO i@ = 6 


Multiply both sides by bj (h) and sumover j, taking into account the 


equality v,.(h) vd. (g) = o,. (hg). Inthe resulting relation 
ao Kjg 


By (0) 8, 


oD teasG. @ f 24. 6 
y kig igi bw Xa (©) 


Page 


432 


433 


445 


446 


453 


453 


569 


Ger je = Ie plieté) th Sa 


jg 0 , andthen sum over i and k _ to obtain the 


characters, 


Let @ be the irreducible representation, and let h be anelementof G. 
Since G is commutative, we have @&(g) @(h) = @(h) O(g), v¥eeG. Setting 
¢ = @(h) in Schur's lemma, we obtain: ¢(h) = we . This holds for any 


heG. Since @ is irreducible, the only possibility is that @ is one- 


dimensional, 
‘ -l 
By assumption, Sale = 5 for some matrix C = (c,)) 2 GL, Gy), “Ware 
operation A > A* = a applied to ee = ee » gives eC = (CP wv 
; g 


-l - 
and hence | Cc Cy con . By Schur's lemma, wehave C*C =iE. 
~ 2 -1 
Furthermore, A = SS lea =uu, we @, and U=y C is the 
= ey 


desired matrix. 


Since a XX) =a) x) ay X.) , it follows that a’ is a character of the 
” : ante ee OME ‘i Z ej 
group A. Since (aa') =a (a') , it followsthat 7 is a homomorphism 


ELOTIEEANE CLO ae Furthermore, 

KerT = fac Ala’ (y) = y= ly Ke A= Ker7 = ¢€ , 
and so lA | = |A| = lA] implies that 7 is an isomorphism, 
Agee SB, 1, oi, 0, Wy 


Comparing dimensions, obtain a direct sum decomposition 
2 


2 2, yD, 
@ (x +y oe etek 


2 2 2 
> H 
ae ieee $y) +z ) i a 


a2 


Since SO(3) is simple, if 7 were non-trivial it would mean that 7 isa 
faithful representation of degree 2. But we can see from Example 3 of Sub- 


section 4 §5 or from the description of the finite subgroups of SU(2) in §3 


570 


Page 


that even the restriction of 7 to Sy = @O cannot be faithful. 


471 3. There exist non-singular matrices C and D_ such that 


Hence 


1 


A'@B = (C@D)(A@B(C 1e@D 4) = (C@ D)(A®@B)(C@D)- 


is a triangular matrix with diagonal entries . B; . These diagonal entries are 
the eigen-values of the matrix A’ ® B’ » andhence of A@B. We have: 
det(A@B) = af = (ma,)™(m8.\" = (det A)™ (det BY. 
~; Ld il i od 
IJ 
493 4, Consider the splitting field F of the polynomial X?-a. Let @¢F beone 
of the roots, sothat a = gP and xP SA S (OK 2 g)P . If it is possible to write 
xP -2 = u(X) v(X), where u(X) is a monic polynomial over K_ having 
positive degree m <p, then, since F[X] is a unique factorization domain, 
a m : m : p 
we must have u(X) = (X-@). Inparticular, @ eK; since 6 eK , we 


have O6c¢K, 
494 5, By the preceding exercise, it suffices to verify that the equality 


p 
Y 
xP .y = (x - ee with g,h ¢ Z [Y] is impossible. 


494 6. By (8), we have x? Si = i 2 . Hence 
ejd 


Mesa of & (x) = (x9 = 1) ox) 


®@ _(X) 
s|njs d ? 


Nn 
s|In:sfd;s#n 


It remains to refer to (10). 


494 


ih 


571 


Since @ (Xx) = ]T1(X-¢), where e¢ runs through the primitive n-th roots 
of 1, it follows thatwhen n> 1 allofthe ¢ are #1, and so the 
distance in the complex plane from q toany e is greater than the distance 


Tl lq = e| > q-i1, andthere's no way 


fmm q) toe. Hence, le (a! 


$ (9) can divide q-1l. 


Work with the equation = aE a = eo = 0 with x,y,ze cS . According 
to Chevalley's theorem (see Exercise 4 §1 Ch, 6), the total number N_ of 
solutions of this equation is divisible by p. Suppose that there is no solution 
with xyz #0. Compute N_ by considering two cases separately. If no 
ae Es exists for which, ae +1 = 0, then the only solutions are (0,0, 0), 


(O,n,#n), (n,O,4n), n=1,2,...,p-1, andhence N=4p-320 


(Caaveyel (0) = fo = By Ih eG for some ae then N = 6p-520 


Sie 


(mod p) = p 
Answer: not in general. 


Using Exercise 9 of $1, consider a primitive 8-th root of one @ in the 
4 
algebraic closure 8, of the field ea EO INC ew <e— Gall weshave 
om) 1 5 S 
a + = 0; in addition, Cio and @ =-@ , sothat @ +@ 


e mil 2 2 a2, 
oe) Setting B=@+@°~ , wehave B =@ +@ +2+=2, so 


- -1 ail 
that p£41(mod8) = f=a@ +a? =a+a =g21= 6 = 
-l 
er esk 
= (B) = 2 3 =i (5) = 1. Similarly, 6B = +5 (mod 8) = gp =a +a?P 


et 
Z 


ee jpash = Sle = oe 7 aaa 


i 
R 
+ 
R 
I 


572 


if n= 1 and Wes)= OY es A # 0, then 
wl 
ee ii a i 
(Oey) = Ds ax y = GEOG) = ies) 2D ay ‘ 
i=] i=l 
where x and y are independent variables. Equating coefficients of yo ; 
m m 
we see that Se f (x) an? and so f(x)=x . Nowsuppose n> 1. 


Setting g(x) = f(x °E), wehave g(xy) = g(x) g(y). This, along with the 
fact that we proved our claim when n= 1, imply that g(x) = x° 7 once 


eae = (det X)* E, it follows that 
£(X) £(X”) = £((det X)E) = g(det X) = (det x)? . 


But f{(X), £(x”) and det X are polynomials in the Xia L<i,|] SB, 


and det X is an irreducible polynomial (see Exercise 7 §3 Ch, 5). By 
Theorem 2, which says that a polynomial ring in any number of variables has 
unique factorization, we have f(X) = c(det ora 


,» where c isa constant, 


2 
anc f{(XY) = f(X)f(Y) => c =c, and, since c #0, wehave c=1, 


Index 


Abelian group 146 

Adjoint matrix 130 

Affine transformations of the 
real iine 162 

Algebra over a field 524 

-—- Lie algebra 539 

-- of linear operators 517 

-- of quaternions 528 

Algebraic element 227, 473 

-- extension 475 

-—- integer 521 

-~ operation 138 

Algebraically closed field 297, 494 

-- independent elements 230 

Alternating group 161 

Annihilator 515 

Associative ring 183 

Associator 539 

Automorphism 168, 198 


Basis columns 75 

-- for induction 51 

-- minor 135 

~- of an abelian group 381 
-- of a vector space 67 
Bezout's theorem 264 
Bijective mapping 3& 
Binary relation 44 
Binomial formula 53 
Burnside's theorem 536 


Cancellation law in an integral 
domain 196 

Cartesian product 36 

Cayley table 165 

Center of an associative algebra 525 

-- of a group 335 

Central function on a group 427 

-- simple algebra 526 

Character of a representation 424 

-- table 437 

Chevalley's theorem 277-278 

Chinese remainder theorem 505 

Cofactor 119 

Commutant of a group 348, 350 

Commutative diagram 39 

Commutator of two elements 348 

Completeiy reducible representation 396 

Complex conjugation 209 

Complex plane 209-210 

Composition of mappings 39 

Congruence 187, 448 

Conjugacy class 335 

Constructive number field 219, 491 

Convolution of functions 532-533 

Cramer's rule 133 

Crystallographic group 448 

Gwelle i135) 

Cyclotomic field 488 


Decomposable representation 395 
Degree of a polynomial 225 

Derived subgroup 348, 349 
Descartes’ rule of signs 310 
Differentiation 269, 540 

-- operator 270 

Dimension of an algebra 524 

-- of a vector space 69 

Direct product of groups 354, 356 
== ch S95), SOA. Sul7 

Distributive laws in a ring 183 
Division algorithm 57, 232, 244-245 
Divisor of zero in a ring 195 

Dual representation 454 

Duality law for finite abelian sroups 445 


Effective action of a group 332 

Eisenstein irreducibility criterion 259 

Elementary abelian group 389 

-- divisors of a finite abelian group 380, 
382 

-- matrices 100 

Endomorphism of a group 170 

Epimorphism 171, 189 

Equivalence class 45 

—— Gelation 55 

Equivalent group actions 341 

-- (isomorphic) group representations 392 

Euclidean algorithm 244 

= Seales, ahi 

Euler's formula 214 

=~) functions 964) 487, 

—— identity 279 

-- theorem 507 

Exponent of a group 383 


Fermat's little theorem 200, 404 
Fibonacci numbers 28 

Field extension 198 

Finite algebraic extension 475-476 
Formal power series 235, 237 
Fraction 197 

Free group of finite rank 358, 366 
Full matrix ring 184 

Fundamental system of solutions 96 
-- theorem of algebra 297 

-—~- theorem of arithmetic 55 

-- theorem on ring homomorphisms 194 


Gauss's lemma 249 

-- method of successive elimination 27 
General linear group 145 

Generalized character of a group 460 
Graph of a funetion 44 

Group 145 

-~ actions on a set 332, 335 

-- algebra of a finite group 532 

-- Calois group 14 

~~ piven by generators and relations 360 


Group, dihedral 361 

-—- of characters of an abelian groun 439 
~- of inner automorphisms 169 

-- of motions of a space 349 

-- of outer automorphisms 364 

-- of quaternions 363 

-- transformation groups 148 


Hamilton-Cayley theorem 545 
Harmonic polynomial 450 
Bermitian matrix 404 
Homomorphism 169, 189, 525 
Hurwitz-Routh criterion 316 


Ideal in a ring 189 

Identity matrix 19 

Indecomposable representation 395 

Index of a subgroup 175 

Injective mapping 38 

Integral domain 196 

Invariant subset under a group action 342 

-- subspace 395 

Invariants of a finite abelian group 
Be S82 : 

-- of a linear group 463 

-- of a quadratic form 465 

Inverse matrix 87 

Inversion relative to a permutation 163 

Irreducible polynomial 238, 247 

-- representation 395 

Isolating the roots of a volynomial 309 

Isomorphism 166, 189, 393 


Jacobi's identity 540 
Jordan cell 394 
-- normal form of a matrix 394, 547 


Kernel of a group action 332 
-- of a homomorphism 169, 189 
-- of a linear map 94 

-- of a representation 391 
Kronecker symbol 84 
Kronecker-Capelli theorem 76 


Lagrange interpolation formula 268 
Laplace operator 449 

Legendre symbol 510 

Leibniz’ rule 271 

Length of an orbit 333 

Linear group 323 

-- group representation 391 

—~ manifold 97 

-- map (transformation) 79 
Linearly ordered set 48 


Matrix of a linear map 79 

-- of a permutation 179 

Maximal element 49 

~— ideal in a ring 497 

Method of undetermined coefficients 
284 

Minimal element 49 

-- polynomial of an element 475 

-- polynomial of a linear operator 545 

Minor of a matrix 108 


260, 


574 


Mobius function 485 

-- inversion formula 487 

Module of finite tyne 515 

-- of a group revresentation 534 

-- over a Lie algebra 540 

-- over a ring 512 

-- torsion-free 515 

Modulus of a congruence 187 

de Moivre's formula 214 

Monic polynomial 233 

Monoid 139 

Monomorphism 171, 189 

Morphism 171 

Multiple root 265 

Multiplicative function 486 

-- group of residue rings 506 

Multiplicity of occurrence of an irreduc- 
ible component of a representation 411 

-- of a weight 542 

Multiply transitive group 338 


Newton interpolation formula 268 
Newton's formulas 287 

Nilpotent element in a ring 206 

-- operator 545 

Nonsingularity criterion for matrices 87 
-- criterion for a transformation 87 
Normal (invariant) subgroup 170 
Normalizer of a subgroup 335 


Or: akteme a7 

Order of an element 152 

Orthogonal complement 499 

== (scour: 3125) 

Orthogonality relation 428-429, 438 


Partition of a set 45 

Permutation 153 

Point-wise operation 185 

Polynomial 225 

—— inbboleeatioym 21a7/ 

Primitive element of a field extension 472 
-- polynomial 249 

=~ se rOOt (Ok SOME meso 

-- root modulo n 507 

Principal ideal 190 

Principle of induction 50 
Projective special linear group 373 


Quadratic field 218 
Quaternions 529 
Quotient algebra 525 
=— feseoyunoy IVS! 

-- module 513 

-- representation 395 
== joie IS 

-- set 46 


Rank of a matrix 73, 74 

Reduced polynomial 267, 277 

Reducible representation 395 

Resular representation 400 

Representation of an algebra over a 
deve ices 247) 

-~- of a group 331 


575 


Representation space 391 Wedderburn's theorem 526 
Residue classes modulo an ideal 193 Wilson's theorem 277 
Ring 182-183 

-- division ring 197 

-- of characters 459 

-~ of endomorphisms of an abelian group 513 

-- of functions 184 

-- of Gaussian integers 496 

-- of integers 184, 522 

=o (ole ilbaksahe (joysracteonetss 3)iL7/ 

-- of polynomials 225, 229 

-- of residue classes 187 

--- principal ideal domain 497 

Root of a polynomial 263 

-- subspace 548 


Schur's lemma 421, 538 

Semidirect product 356 

Semigrouv 139 

Semilinear function 109 

Sign of a permutation 160 

Simple group 351 

== Gnvetaliuuiles: Sy1l7/ 

Skew field (division ring) 197 
Skew-hermitian matrix 331, 541 
Skew-symmetric function 109, 159, ,296 
Solvable group 350-351 

Special linear group 147, 323 

~- orthogonal group 323 

-- unitary group 325 

Spectrum of a matrix 544 

Spherical functions 453 

Splitting field of a polynomial 301, 480 
Square matrix 19, 84 

Stabilizer of a point 333 

Stable polynomial 315 

Stationary subgroup of a point 333 
Steinitz' theorem 297 

Structure constants 460 
Subrepresentation 395 

Surjective map 38 

Sylow theorems 368 

Symmetric group 153 

-- polynomials 280 

Symmetry groups for regular polyhedra 416 


Taylor's formula 318 

Tensor product of representations 458 
-- product of vector spaces 455 
Torsion 515 

Tower of extensions 474 
Transcendental element 227, 473 
Transitive group 337 

Transpose matrix 120-121 
Transposition 153 

Type of a finite abelian group 382 


Unimodular group 147 

Unique factorization domain (factorial 
ring) 238, 496 

Unitary group 323 

-~ representation 405 


Vandermonde determinant 122 
Vieta's formulas 275 


8035 


Y 


Universitext 
Editors: F.W. Gehring, P.R. Halmos, C.C. Moore 


Chern: Complex Manifolds Without Potential Theory 

Chorin/Marsden: A Mathematical Introduction to Fluid Mechanics 

Cohn: A Classical Invitation to Algebraic Numbers and Class Fields 

Curtis: Matrix Groups 

van Dalen: Logic and Structure 

Devlin: Fundamentals of Contemporary Set Theory 

Edwards: A Formal Background to Mathematics 1: Logic, Sets and 

Numbers 
Edwards: A Formal Background to Mathematics 2: A Critical Approach to 
Elementary Analysis 

Frauentha!: Mathematical Modeling in Epidemiology 

Fuller: FORTRAN Programming: A Supplement for Calculus Courses 

Gardiner: A First Course in Group Theory 

Greub: Multilinear Algebra 

Hajek/Havranek: Mechanizing Hypothesis Formation: Mathematical 
Foundations for a General Theory 

Hermes: Introduction to Mathematical Logic 

Kalbfleisch: Probability and Statistical Inference 1/II 

Kelly/Matthews: The Non-Euclidean, Hyperbolic Plane: Its Structure and 
Consistency 

Kostrikin: Introduction to Algebra 

Lu: Singularity Theory and an Introduction to Catastrophe Theory 

Marcus: Number Fields 

Meyer: Essential Mathematics for Applied Fields 

Moise: introductory Problem Courses in Analysis and Topology 

Oden/Reddy: Variational Methods in Theoretical Mechanics 

Reisel: Elementary Theory of Metric Spaces: A Course in Constructing 

Mathematical Proofs 
Rickart: Natural Function Algebras 
Schreiber: Differential Forms: A Heuristic Introduction 


DATE DUE 


4 
a 
” 


betas 


cS 
rm 
G 


(5) 
Tv} 
oS 
he) 
( <iten} 
[ em ] 
CS 
~ 


aay 
he) 
x 
< 
aT 
weet 
he 
ee) 


Fadl 
% 


anf 
2 
E 
Cal 
> 
ce , 
=) 
ss 
—) 
— 


Cc 
rt 
G. 
LE 


- 
= 


CARR McLEAN, TORONTO FORM #38-297 


prune & i( 


OA 154.2 seg is. 1982 
kse 010101 000 


Ly ih 


cae ae 


OAlaene2 ~.K6/13 1962 “WTLAS 
Kostrikin, A. I. (Aleksei 
Ivanovich) 

[Vvedenie v algebru. English] 

Introduction to algebra 


ee 


UI37G9 


